FROM THE SCHRODINGER PROBLEM TO THE 
MONGE-KANTOROVICH PROBLEM 
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Abstract. The aim of this article is to show that the Monge-Kantorovich problem is the 
limit of a sequence of entropy minimization problems when a fluctuation parameter tends 
down to zero. We prove the convergence of the entropic values to the optimal transport 
cost as the fluctuations decrease to zero, and we also show that the limit points of the 
entropic minimizcrs arc optimal transport plans. We investigate the dynamic versions of 
these problems by considering random paths and describe the connections between the 
dynamic and static problems. The proofs are essentially based on convex and functional 
analysis. We also need specific properties of T-convergence which we didn't find in the 
literature. Hence we prove these T-convergence results which are interesting in their own 
right. 
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1. Introduction 

The aim of this article is to describe a link between the Monge-Kantorovich optimal 
transport problem and a sequence of entropy minimization problems. We show that the 
Monge-Kantorovich problem is the limit of this sequence when a fluctuation parameter 
tends down to zero. More precisely, we prove that the entropic values tend to the optimal 
cost as the fluctuations decrease to zero, and also that the limit points of the entropic 
minimizers are optimal transport plans. We also investigate the dynamic versions of these 
problems by considering random paths. 

Our main results are stated at Section [3j they are Theorems 13. 3[ 13.61 and 13.71 
Although the assumptions of these results are in terms of large deviation principle, it 
is not necessary to be acquainted to this theory or even to probability theory to read 
this article. It is written for analysts and we tried as much as possible to formulate the 
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probabilistic notions in terms of analysis and measure theory. A short reminder of the 
basic definitions and results of large deviation theory is given at the Appendix. 

In its Kantorovich form, the optimal transport problem dates back to the 40 's, see 
|Kan42} [Kan48j . It appears that its entropic approximation has its roots in an even older 
problem which was addressed by Schrodinger in the early 30 's in connection with the 
newly born wave mechanics, see |Sch32j . 

The Monge-Kantorovich optimal transport problem is about finding the best way of 
transporting some given mass distribution onto another one. We describe these mass 
distributions by means of two probability measures on a state space X : the initial one 
is called /io G P{X) and the final one [i% G P(X) where P{X) is the set of all probability 
measures on X . The rules of the game are (i): it costs c(x, y) G [0, oo] to transport a unit 
mass from x to y and (ii): it is possible to transport infinitesimal portions of mass from 
the ^-configuration /i to the ^-configuration /ii. The resulting minimization problem is 
the celebrated Monge-Kantorovich problem 

c d7i — y min; n G P(X 2 ) : ir = /i , ~K\ = fMi (1) 

where P(X 2 ) is the set of all probability measures on X 2 and 7r , h\ G P(X) are respectively 
the first and second marginal measures of the joint probability measure 7r G P(X 2 ). 
Optimal transport is an active field of research. For a remarkable overview of this exciting 
topic, see Villani's textbook [Vil09j and the references therein. 

Now, let us have a look at Schrodinger's problem. Suppose that you observe a very large 
number of non-interacting indistinguishable particles which are approximately distributed 
around a probability measure /io G P(X) on the state space X. We view /i as the initial 
configuration of the whole particle system. Suppose that you know that the dynamics of 
each individual particle is driven by a stochastic process whose law is R k G P(fi) : i.e. a 
probability measure on the space 

Q = X^ 

of all paths3 from the time interval [0, 1] to the state space X. The parameter k describes 
the fluctuation level 1/k. As k tends to infinity, R k tends to some deterministic dynamics: 
R°° describes a deterministic flow. As a typical example, one can think of R k as the 
law of a Brownian motion with diffusion coefficient 1/k. Knowing this dynamics, the law 
of large numbers tells you that you should expect to see the configuration of the large 
particle system at the final time t = 1 not very far from some expected configuration, 
with a very high probability. Now, suppose that you observe the system in a configuration 
close to some fi\ G P(X) which is far from the expected one. Schrodinger's question is : 
"Conditionally on this very rare event, what is the most likely path of the whole system 
between the times t = and t = 1?" As will be seen at Section [2J the answer to this 
question is related to the entropy minimization problem 

jH(P\R k ) -> min; P G P(fi) : P = /io, Pi = fii (2) 
k 

where H(P\R k ) is the relative entropy of P with respect to the reference stochastic process 
R k and the renormalization factor 1/k is here to prevent the entropy from exploding as 
the fluctuations of R k decrease. Recall that H(P\R) := / n log(fs) dP G [0, oo], for 
any P, R G P(f2). Schrodinger's problem looks like the Monge-Kantorovich one not only 
because of /io and fix, but also because of some cost of transportation. Indeed, if the 

During the rigorous treatment, we shall only consider subspaces 51 of ^I ' 1 !, for instance the subspace 
of all continuous paths. 
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random dynamics creates a trend to move in some direction rather than in another one, it 
costs less to the particle system to end up at some configurations \i\ than others. Even if 
no direction is favoured, we shall see that the structure of the random fluctuations which 
is described by the sequence (R k )k>i encodes some zero-fluctuation cost function c on X 2 . 

Remark that, although 1/k should be of the order of Planck's constant h to build a 
Euclidean analogue of the quantum dynamics, in |Sch32] Schrodinger isn't concerned with 
the semiclassical limit k — > oo. Let us also mention that Schrodinger's paper is the starting 
point of the Euclidean quantum mechanics which was developed by Zambrini [CZ08J. 

An informal presentation of the convergence result. Our assumption is that (R k )k>i 
satisfies a large deviation principle in the path space Q. This roughly means that 

R k (A) x exp ( —k inf C(u) ] , (3) 

for some rate function C : Q — > [0, oo] and a large class of measurable subsets A G 
Q. For a rigorous definition of a large deviation principle and basic results about large 
deviation theory, see the Appendix. Very informally, the most likely paths oj correspond 
to high values R k (doj) and therefore to low values of C(oj). Under endpoint constraints, 
it shouldn't be surprising to meet the following family of geodesic problems 

C(ou) — >■ min; oj G Q : ojq = x, oji = y 

where x, y describe X and ojq and oj\ are the initial and final positions of the path oj. 
We see that the large deviation behavior of the sequence (R k )k>i brings us a family of 
geodesic paths. It will be shown that the limit (in some sense to be made precise) of the 
problems ([2]) is the Monge-Kantorovich problem with the "geodesic" cost function 

c(x, y) = inf{C(a;); oj G Vt : ojq = x, 0J\ = y}, x,y<EX. (4) 

For instance, if R k is the law of a Brownian motion on X = IR d with diffusion coefficient 
1/k, the rate function C is given by Schilder's theorem, a standard large deviation result 
which tells us that C is the classical kinetic action functional which is given for any path 
uj by 

C{u) = \ ( \co t \ 2 dte [0,oo] 

1 ^[0,1] 

if uj = (wi) <i<i is absolutely continuous (oj is its time derivative), and +oo otherwise. 
The corresponding static cost is the standard quadratic cost 

c(x,y) = ~ x l 2 ' x,yeR d . 

As a consequence of our general results, we obtain that if the quadratic cost transport 
problem admits a unique solution, for instance when J x \x\ 2 (io(dx) , j x \y\ 2 fi\(dy) < oo and 

/xo is absolutely continuous, then the sequence (P k )k>i built with the diffusion processes 
which are the unique solutions to ([2]) as k varies, converges to the deterministic process 

P(.) = [ 5 a *y(-)n(dxdy) G P(ft) 
Jx 2 

where for each x, y G X, a xy is the constant velocity geodesic between x and y, 8 a xy is the 
Dirac measure at a xy and tt G P(X 2 ) is the unique solution to the quadratic cost Monge- 
Kantorovich transport problem (pQ). The marginal flow of P is defined to be (P*)o<t<i 
where for each < t < 1, P< = j x2 5 a ^v(-)Tr(dxdy) G P(X) is the law of the random 
position at time t when the law of the whole random path is P G P(O). This flow is 
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precisely the displacement interpolation between p, and Hi with respect to the quadratic 
cost transport problem, see |Vil09t Chapter 7] for this notion. 

Presentation of the results. The quadratic cost is an important instance of transport 
cost, but our results are valid for any cost functions c and C satisfying ([3]) and (J1J), plus 
some coercivity properties. For each k > 1, denote p k G P(X 2 ) the law of the couple of 
initial and final positions of the random path driven by R k G P(fl). Then, 

^H(n\p k ) -»■ min; n G P(X 2 ) : 7r = HOj ~K\ = Hi (5) 

is the static "projection" of (J2]). 

In the sequel, any limit of sequences of probability measures is understood with respect 
to the usual narrow topology. Theorem 13. 71 states that, as k tends to infinity, there exists 
a sequence (Hi)k>i i n P(<^) such that lim^oo Hi = Hi and the modified minimization 
problem 

-H(n\p k ) -> min; vr G P(X 2 ) : tt = Ho, tti = /x£ (6) 

verifies the following two assertions: 

• The minimal value of fl6]) tends to the minimal value of (JTJ), where c is given by 
(jl]), which is precisely the optimal transport cost T c (ho,Hi)'i 

• If T c (no, Hi) is finite, for all large enough k, (JBj) admits a unique minimizer 7r fc , 
the sequence (7T )&>i admits limit points in P(X 2 ) and any such limit point is a 
solution to the Monge-Kantorovich problem (CQ), i.e. an optimal transport plan. 

It is not necessary that c is derived from a dynamical cost C via (]1J). A similar result 
holds in this more general setting, this is the content of Theorem 13.31 The dynamical 
analogue of this convergence result is stated at Theorem 13.61 and the connection between 
the dynamic and static minimizers is described at Theorem 13.71 

Examples of random dynamics (R k )k>i are introduced. They are mainly based on 
random walks so that one can compute the corresponding cost functions C and c. In 
particular, we propose dynamics which generate the standard costs c p (x,y) := \y — x\ p , 
x,y G M. d for any p > 0, see Examples 14.61 for such dynamics based on the Brownian 
motion. 

We also prove technical results about T-convergence which we didn't find in the litera- 
ture. They are efficient tools for the proofs of the above mentioned convergence results. A 
typical result about the T-convergence of a sequence of convex functions {fk)k>i is: If the 
sequence of the convex conjugates [fk)k>i converges pointwise, then (fk)k>i T-converges. 
Known results of this type are usually stated in separable reflexive Banach spaces, which 
is a natural setting when working with PDEs. But here, we need to work with the narrow 
topology on the set of probability measures. Theorem 16.21 is such a result in this weak 
topology setting. 

Finally, we also proved Theorem 17.11 which tells us that if one adds a continuous con- 
straint to an equi-coercive sequence of T-converging minimization problems, then the 
minimal values and the minimizers of the new problems still enjoy nice convergence prop- 
erties. 

Literature. The connection between large deviation and optimal transport has already 
been done by Mikami |Mik04] in the context of the quadratic transport. Although no 
relative entropy appears in [Mik04j where an optimal control approach is performed, our 
results might be seen as extensions of Mikami's ones. In the same spirit, still using optimal 
control, Mikami and Thieullen jMT06t IMT08] obtained Kantorovich type duality results. 
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Recently, Adams, Dirr, Peletier and Zimmer |ADPZj have shown that the small time 
large deviation behavior of a large particle system is equivalent up to the second order to 
a single step of the Jordan-Kinderleher-Otto gradient flow algorithm. This is reminiscent 
of the Schrodinger problem, but the connection is not completely understood by now. 

The connection between the Monge-Kantorovich and the Schrodinger problems is also 
exploited implicitly in some works where (JT|) is penalized by a relative entropy, leading to 
the minimization problem 



where p £ P(X 2 ) is a fixed reference probability measure on X 2 , for instance p = /i <S> pi- 
Putting p k = Z^ x e~ kc p with Z k = J x2 e~ kc dp < oo, up to the additive constant log(Z k ) j k, 
this minimization problem rewrites as (jSJ). See for instance the papers by Riischendorf 
and Thomsen [RT931 1RT98] and the references therein. Also interesting is the recent 
paper by Galichon and Salanie |GSj with an applied point of view. 

Proposition 13.41 below is an important technical step on the way to our main results. 
A variant of this proposition, under more restrictive assumptions than ours, was proved 
by Dawson and Gartner |DG94[ Thm 2.9] in a context which is different from optimal 
transport and with no motivation in this direction. Indeed, [DG94] is aimed at studying 
the large deviations of a large number of diffusion processes subject to a hierarchy of 
mean-field interactions, by means of random variables which live in P(P(fl)): the set of 
probability measures on the set of probability measures on the path space Q. The proofs of 
Proposition 13 .41 in the present paper and in |DG94] differ significantly. Dawson-Gartner's 
proof is essentially probabilistic while the author's one is analytic. The strategy of the 
proofs are also separate: Dawson-Gartner's proof is based on rather precise probability 
estimates which partly rely on the specific structure of the problem, while the present one 
takes place in the other side of convex duality, using the Laplace- Varadhan principle and 
T-convergence. Because of these significantly different proofs and of the weakening of the 
hypotheses in the present paper, we provide a complete analytic proof of Proposition 13.41 
at Section [5j 

Organization of the paper. Section [2] is devoted to the presentations of the Monge- 
Kantorovich and Schrodinger problems. We also show informally that they are tightly 
connected. Our main results are stated at Section [3J We also give here a simple illustration 
of these abstract results by means of Schrodinger's original example based on the Brownian 
motion. Since our primary object is the sequence of random processes (R k )k>i, it is 
necessary to connect it with the cost functions C and c. This is the purpose of Section H] 
where these costs functions are derived for a large family of random dynamics. The proofs 
of our main results are done at Section [5j They are partly based on two T-convergence 
results which are stated and proved at Sections El and [7J Finally, we recall some basic 
notions about large deviation theory at the Appendix. 

Notation. Let us introduce our main notations. 

Measures. For any topological space X, we denote P(X) the set of all Borel probability 
measures on X and we endow it with the usual narrow topology o~(P(X), C&(X)) weakened 
by the space C&(X) of all continuous bounded functions on X. We also furnish P(X) with 
the corresponding Borel cx-field. 

The push-forward of the measure m by the measurable application / is denoted by f#m 
and defined by f#m(A) = m(f~ 1 (A)) for any measurable set A. 
The Dirac measure at a is denoted by 5 a . 
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Measures on a path space. We take a polish space X which is furnished with its Borel a- 
field. The relevant space of paths from the time interval [0, 1] to the state space X is either 
the space Q = C([0, 1], X) of all continuous paths, or the space Q = D([Q, 1], X) of paths 
which are left continuous and right limited (cadlag^) paths. We denote X = (Xt)te[o,i] 
the canonical process which is defined for all t G [0, 1] by 

X t (u) := L0 t , u = (wt)te[o,i] e ^- 

For each t G [0, 1], X t is the position at time t which is seen as an application on fi. Of 
course, X is the identity on Q. The set Q is endowed with the cr-field a(X t ,t G [0,1]) 
which is generated by the canonical process. It is known that it matches with the Borel 
cx-field of Q when Q is furnished with the Skorokhod topolog}!^ which turns Q into a polish 
space, see |Bil68j . We denote P(X), P(X 2 ) and P(fi) the set of all probability measures 
on X, X 2 = X x X and Q respectively. For any P G P(fi), i.e. P is the law of a random 
path, we denote 

P t :=(%P6P(4 tG[0,l]. 

In particular, Pq and Pi are the laws of the initial and final random positions under P. 
Also useful is the joint law of the initial and final positions 

P 01 :=(X ,X l ) # PeP(X 2 ). 

Of course, Poi carries more information than the couple (Pq, Pi) because of the correlation 
structure. A similar remark holds for P G P(^) and (P t ; t G [0, 1]) G P(A')t°' 1 l We denote 
the disintegration of P with respect to (X ,Xx) : P(du) = J x2 P xy (du) P i(dxdy) where 

P x y(.):=P(-\X = x,X 1 =y), x,yeX 

is the conditional law of X knowing that Xq = x and X\ = y under P. Its is usually called 
the bridge of P between x and y. 

When working with the product space X 2 , one sees the first and second factors X as 
the sets of initial and final states respectively. Therefore, the canonical projections are 
denoted X (x,y) := x and Xi(x,y) := y, (x,y) G X 2 . We denote the marginals of the 
probability measure n G P(X 2 ) by n := (X ) # 7r G P(X) and 7Ti := (Xi)#7r G P(X). 

Functions. Recall that a function / : X — > (— oo, oo] is said to be lower semicontinuous 
on the topological space X if all its sublevel sets {/ < a}, a6l are closed. It is said to 
be coercive if X is assumed to be Hausdorff and its sublevel sets are compact. 
Let X and Y be two topological vector spaces equipped with a duality bracket (x, y) G M, 
that is a bilinear form on X x Y. The convex conjugate /* of / : X — > (—00,00] with 
respect to this duality bracket is defined by 

f*(y) := sup{(x,y) - f(x)} G [-00,00], y G Y. 
It is a convex o~(Y, X)-lower semicontinuous function. 

The relative entropy of the probability P with respect to the probability R is 

H(P\R):={ /M^) dPE[0,oo] ifP«P 
I 00 otherwise. 



2 This is the french acronym for continu a droite et limite a gauche. 

3 In the special case where the paths are continuous: fl — C([0, 1], A 7 ), this topology reduces to the 
topology of uniform convergence. 
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2. Monge-Kantorovich and Schrodinger problems 

In this section we present the Monge-Kantorovich optimal transport problem and the 
Schrodinger entropy minimization problem. Then, we show informally with the aid of 
Schrodinger's original example that they are connected to each other, by letting some 
fluctuation coefficient tend to zero. 

Warning. This informal section contains probabilistic material. The probability-allergic 
reader can skip it without harm. Nevertheless, it also contains some very clever ideas of 
Schrodinger which acted guide for the author. 

The Monge-Kantorovich optimal transport problem. Let c : X 2 — » [0, oo] be 

a lower semicontinuous function on X 2 with possibly infinite values. For any x, y G X, 
c(x, y) is interpreted as the cost for transporting a unit mass from location x to location y. 
Let /io,/ii £ P(^0 be two prescribed probability measures on X. An admissible transport 
plan from fi to \i\ is any probability measure n G P(X 2 ) which has its first and second 
marginals equal to 7r = /io and 7i"i = /ii, respectively. For such a tt, 

cdn G [0, oo] 

is interpreted as the total cost for transporting /i to \i\ when choosing the plan tt. The 
Monge-Kantorovich optimal transport problem is the corresponding minimization prob- 
lem, i.e. 

cdn — >■ min; tt G P(X 2 ) : tt = /i , 7i"i = H\. (MK) 

A minimizer tt G P(X 2 ) is called an optimal plan and the minimal value inf (IMKp G [0, oo] 
is the optimal transport cost. Remark that (IMKp is a convex minimization problem. But, 
as it is not a strictly convex problem, it might admit several solutions. 

The Schrodinger entropy minimization problem. Take a reference process R on Q 
(the unusual letter R is chosen as a reminder of reference) . By this, it is meant a positive 
o"-finite measure on Q which is not necessarily bounded. Consider n independent random 
dynamic particles (Y l ; 1 < % < n) where each random realization of Y l lives in Q. More 
specifically, (Y l ; 1 < i < n) is a collection of independent random paths where for each i, 
the law of Y l is 

Law(F i | Y*) = R{- | X = Y*) G P(fi) (7) 
and (Yq] 1 < i < n) should be interpreted as the random initial positions. 

Example 2.1 (Schrodinger's heat bath). As a typical example, one can take R to be the 
law of the Brownian motion (Wiener process) on X = M. d with diffusion coefficient a 2 and 
the Lebesgue measure as its initial distribution. The random motions are described by 

Yl = Yj + <tB\, 1 < % < n, t G [0, 1] 

where Yq,..., Y n are random independent initial positions, B 1 , . . . , B n are independent 
Brownian motions with initial position B l = G M. d , i = 1, . . . , n, and o > is the square 
root of the temperature a 2 . 

Schrodinger's original problem |Sch32] which is based on this specific example can 
be stated as follows. Suppose that at time t = you observe a large particle system 
approximately in the configuration /io G P(M d ). The law of large numbers tells you that 
with a very high probability you observe the system at time t = 1 in a configuration 
very near the convolution /i * A/"(0, a 2 l) of /io and the normal density with mean and 
covariance matrix <r 2 Id in M. d . But, since the number n of particle is finite, it is still 
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possible (with a tiny probability of order e~ an with a > 0) to observe the system at time 
t — 1 in a configuration which is significantly far form the expected profile of distribution 
fio * A/"(0, <t 2 I). Now, Schrodinger's question i|| Suppose that you observe the system at 
time t = 1 in a configuration which is approximately Hi G P(lR d ) and that \i\ is significantly 
different from /i * A/"(0, cr 2 I), what is the most likely path of the whole system from /i to 
[L\ during the time interval [0,1]? In |Sch32] . Schrodinger gave the complete answer to 
this question with a proof based on Stirling's formula. Although proved informally, there 
is nothing significant to be added today to his answer. 

The modern way of addressing this problem is in terms of large deviations, see [DZ98J 
for an excellent overview of the large deviation theory (a short reminder about large 
deviation theory is also given at the Appendix). This has been done by Follmer in his 
Saint-Flour lecture notes |F5188] . Denoting S a the unit mass Dirac measure at a, the 
whole system is described by its empirical measure 



1 - 

L n := - V^,6 P(O). 



n . 



It is a P(fi)- valued random varia bid which contains all the information about the dynamic 
system up to any permutation of the labels of the particles. Nothing is lost when the 
particles are indistinguishable. It also contains more information than the random path 
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te[o,i]^L? : =-ZX iGP M 



which describes the evolution of the configurations. The observed initial and final config- 
urations are the empirical measures 



1 n 1 n 

n z — ' n z — ' 1 



Now, we give an informal presentation of the answer to Schrodinger's question. For a 
rigorous treatment, one can have a look at |F5188] . Take /^o>A*i £ P(X) an d Cq,C( two 
5- neighborhoods in P(X) (with respect to a distance on P(X) compatible with the narrow 
topology a(P(X), Gf,(X))) of /x and H\ respectively. One can recast Schrodinger's problem 
as follows: Find the measurable sets A C P(^) such that the conditional probability 



P^geCg,^ eCf) 

is maximal when n is large and 5 is small. We introduced the 5-blowups Cq and Cf to 
prevent from dividing by zero. A slight variant of Sanov's theorem^, a standard large de- 
viation result whose exact statement is in terms of large deviation principle, see Theorem 
lA.ll at the Appendix, states that 

F(L n eA) x exp (— ninf {H(P\R); P G A}) , A C P(fi) 



^Schrodinger's french words are: "Imaginez que vous observez un systeme de particules en diffusion, 
qui soient en equilibre thermodynamique. Admettons qu'a un instant donne to vous les ayez trouve en 
repartition a peu pres uniforme et qu'a t\ > to vous ayez trouve un ecart spontane et considerable par 
rapport a cette uniformite. On vous demande de quelle maniere cet ecart s'est produit. Quelle en est la 
maniere la plus probable ?" 

5 Strictly speaking L n is a measurable function with its values in P(f2) and the statement "L n £ P(f2)" 
is an abuse of notation. Nevertheless it is a useful shorthand which will be used below without warning. 

6 If R is a probability measure, then this is really Sanov's theorem and H(P\R) G [0, oo]. 



for any measurable subset A of P(f2) (by the way, one must define a a-field on P(fi)) 
where 

H(P\R):= J Jog (^\ dPE (-00,00], PGP(fi) 

is the relative entropy of P with respect to R. Under integrability conditions on Ho 
and Hi which insure that there exists some P E P(fi) such that P = Ho, Pi — A*i and 
H(P\R) < 00, one deduces that 



exp (— ninf |if(P|i2); P : P E A, P E Cq, Pi E Cf}) 



P(L G A I L E C , L X EC X) n ^ ^ (- n i n f {H(P\R); P : P G C s , Pi E Cf}) 
It follows that we have the conditional law of large numbers 

1 if A 3 P s 
otherwise 



lim P(L n G A I LI E C S Q , L\ E Cf) = 

n— too 

where P 5 is the unique solution of the entropy minimization problem 



H(P\R) min; P E P(Q) : P G C 5 , Pi E Cf. 

The uniqueness comes directly from the strict convexity of the relative entropy H(-\R). 
We finally see that, under some conditions on the limits Cq ^> {ho} and Cf ^ {hi}, the 

solution to the Schrodinger problem is 



lim lim F(L n E A \ Lq E CL L\ E Cf) = { 



1 if A 3 P 
otherwise 



where P is the unique solution of the Schrodinger entropy minimization problem: 

H(P\R) ->• min; P G P(fi) : P = /i , Pi = A*i- (S) 

If one prefers relative entropies with respect to probability measures, P is also the unique 
solution to 

P(P|P W ) -> min; P G P(fi) : P = /i , P = fii (8) 
where the a-fmite measure R has been replaced by the probability measure 

R^idoj) := [ R(du I X = x) Ho(dx) E P(fi) (9) 

which is the law of the process with initial distribution Ho G P(^f ) and the same dynamics 
as R. This last formulation is analytically suitable, but it introduces an artificial time 
asymmetry. Nevertheless, we keep it because it will be useful. Remark that 

H(P\R^ ) E [0,oo] 

is nonnegative and the minimization problems (JH]) and (jSD share the same minimizer under 
the constraint P = Ho since H(P\R^°) = H(P\R) - f x dP = H(P\R) - J x dno- 
The minimization problem ([5]) looks like the Monge-Kantorovich problem flMKp . but 
we can do better in this direction, relying on the tensorization property of the relative 
entropy. Namely, for any measurable function $ : Q — > Z where Z is any polish space 
with its Borel cx-field, we have 

H(P\R) = H (0 # P|0 # P) + J z H (P(- I <P = z)\R(- 10 = ^)) <l>#P(dz). (10) 
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With (p = (X ,Xi), this gives us 

H(P\R) = H(P 01 \Roi) + J r 2 H(p*v\R x ^ P 01 (dxdy). (11) 

Now, decomposing the marginal constraint P = /x , Pi = A*i into P i = tt E P(X 2 ) and 
(X )#7r = /i , (Xi)#7r = /ii we obtain 

M{H(P\R); P G P(fi) : P = Po, Pi = M 
= inf{inf [H (P\R); P G P(fi) : P Q1 = tt]; tt G P(^ 2 ) : tt = /i , 7r a = /ij. 
With ([TT]) . we see that the inner term is 

inf [#(P|P); P G P(fi) : Poi = tt] 



P(7r|P 01 )+inf 



y PjP^ P^) Tr(dxdy); P G P(fi) : P i = tt 



= P(tt|P i) 

where the inf is uniquely attained when P xy = R xy , for 7r-almost every (x,y), since in 
this case = H^P xy R xy ^j which is the minimal value of the relative entropy. This also 
shows that for each 7r G P(X 2 ), 

inf [H(P\R); P G P(fl) : P i = vr] = H(R n \R) = H(tt\R 01 ) (12) 

where 

iT(-) := / R xy {-)Ti{dxdy) (13) 

is the mixture of the bridges R xy with 7r as a mixing measure. Hence, the solution of (JH]) 
is 

P = R i 

where tt G P(X 2 ) is the unique solution of 

H(tt\R 01 ) — > min; tt G P(^ 2 ) : tt = /j, , tt\ = \L\. (14) 

Connecting (j3]) and (jMKJ). The problem (|14p is similar to (JMKJ), but it remains some- 
thing to do in order to connect Poi with some cost function c. We are going to do it in 
the special case which is described at Example 12.1} by letting the temperature 

1/Jfc := a 2 

tend to zero where k > 1 describes the positive integers. The general situation will be 
investigated later at Section |3j 

Let us make the /c-dependence explicit in our notation. We denote R k the law of the 
process Y k which is defined by 

Y k = Y + ^/l/k~B t , < t < 1 (15) 

with Yq having the Lebesgue measure as its distribution. In particular, the joint law of 
the initial and final positions under this reference process at positive temperature 1/k is 

/I 1 2 



R^(dxdy) = dx(2Tr/k)- d/2 exp [ -fc 1 * - ' ) dy. 



Rewriting (fl4l) with the fc-dependence made explicit, we get 
1 

k 



1 H(Tr\R k m ) -»■ min; tt G P{X 2 ) : tt = /i , tti = //i. 
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We "tilde" the name of this problem, because there will be another "untilded" problem 
later: 



1 



f H(7i\R^' k ) -> min; n E P(X 2 ) : tt = /i , tq = /x* («S * 



fc 



01/ 



with better convergence properties, where the constraint /xi is replaced by the "moving" 
constraint /xf which is indexed by and satisfies lim^oo /xf = /xi and i?Q X is replaced by 
Rof°, see©. 

We have also introduced the renormalization (-li?^). To see that r is the right multi- 
plying factor, suppose that 7r is such that H^Rq^) < oo. This implies that 7r <C <C A 
where X(dxdy) = dxdy stands for the Lebesgue measure on ~R d x R d . We see that 

a r l |2 

i/(7r|0 = jy(7r|A) + - log(27r/fc) + / W ~ Xl Ti(dxdy). 

If we assume that n satisfies J R d xR d ir(dxdy) < oo, we obtain that the Boltzmann 
entropy H(tt\X) is finite and 

12 



1 /* I I 2 

lim -#(7r|i4) = / |y-3:| 7r((fody) 



which is the cost for transporting ttq to 7i"i with respect to the quadratic cost 

c(x,y) = \y-x\ 2 /2, x,y ER d . 



This indicates that (S^) might converge to flMKj) in some sense, as fc tends to infinity. 
Indeed, this will be made precise and proved in the subsequent pages. 
The renormalized problem (J5|) with the dependence on fc made explicit is 

^H(P\R k ) -> min; P E P(Q) : P = xx , Pi = Ml- 

Because of (fT3"j) . its solution is R k ' nk where 7f fc is the solution to (S^). Remark that for 
two distinct fc, fc' > 0, the supports of R k and R k are disjoint subsets of Q. Therefore, 
at the level of the process laws P E P(O), we see that for all P E P(fi), H(P\R k ) = oo 
for every fc > 1 except possibly one. It appears that the pointwise limit of \H(-\R k ) as fc 
tends to infinity is irrelevant. We shall see that the good notion of conv erge nce is that of 
T-convergence. Also, we shall need the following "untilded" variant of fkS fc l) : 

±. H {P\R k >">) -+ min; P E P(O) : P = /x , P x = //{, 

where lim*^^ /xf = /xi. 

3. Statement of the main results 

The statements of our results is in terms of T-convergence and large deviation principle. 
We start introducing their definitions. 

T-convergence. We refer to the monograph by Dal Maso [Mas93j for a clear exposition of 
the subject. Recall that if it exists, the T-limit of the sequence (fk)k>i of (— oo, oo]-valued 
functions on a topological space X is given for all x in X by 

r- lim f k (x) = sup lim inf f k (y) 

k— >oo V£jV(x) k^roo y&V 

where Af(x) is the set of all neighborhoods of x. In a metric space X, this is equivalent 
to: 
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(i) For any sequence (x k ) k >x such that Hindoo x fc = x, 

liminf f k (x k ) > f(x) 

k— i>oo 

(ii) and there exits a sequence (x k ) k >i such that linifc^oo Xk = x and 

liminf f k (x k ) < f(x). 

fc— )-oo 

Item (i) is called the lower bound and the sequence (x k ) k >i in item (ii) is the recovery 
sequence. 



Large deviation principle. We refer to the monograph by Dembo and Zeitouni [DZ98J 
for a clear exposition of the subject. Let X be a polish space furnished with its Borel 
a-field. One says that the sequence (7 n )n>i of probability measures on X satisfies the 
large deviation principle (LDP for short) with scale n and rate function /, if for each 
Borel measurable subset A of X we have 

(i) 1 1 (ii) 

— inf I(x) < liminf — log 7„ (A) < limsup — log7„(A) < — inf I(x) (16) 

x&ntA n— >oo fl n— >oo H x£c\A 

where int A and cl A are respectively the topological interior and closure of A in X and 
the rate function I : X — > [0, oo] is lower semicontinuous. The inequalities (i) and (ii) are 
called respectively the LD lower bound and LD upper bound, where LD is an abbreviation 
for large deviation. The LDP is the exact statement of what was meant in previous section 
when writing 

'yn(A) x exp I — n inf I(x) ) 

for "all" Ad X. 



The main results. For any topological space X, we denote P(X) the set of all Borel 
probability measures on X and we endow it with the usual weak topology a(P(X), Cb{X)) 
weakened by the space Cb(X) of all continuous bounded functions on X. We also furnish 
P(X) with the corresponding Borel a-field. 

One says that a function / : X — > (— oo, oo] is coercive if for any real a > inf /, the 
sub level set {/ < a} is compact. This implies that / is lower semicontinuous if X is 
Hausdorff, which will be the case of all the topological spaces in the sequel. 
The convex analysis indicator of a set A C X is defined by 

K,6A} = la{x) = I ^ otherwise , x e X. 

We keep the notation of Section [2J In particular, X is a polish space (metric complete 
and separable) with its Borel cr-field and 

n = d([o,i],x) 



is the set of all cadlag ^Y-valued paths endowed with the Skorokhod metric, which turns 
it into a polish space. 
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Static version. For each integer k > 1, we take a measurable kernel 

(/) M eP(4ieA:) 

of probability measures on X. We also take /i G P(X), denote 

p k ^°{dxdy) := p {dx)p k ' x {dy) G P(^ 2 ) 

and define the functions 

C fc r(vr) := ^(7r|p^°) + L {7TO=flo} , k > 1; C£°(tt) := ^ c^tt + t{7ro=Mo} , vr G P(* 2 ). 

Proposition 3.1. We assume that for each x G X, the sequence ((X 1 )#p h,x )k>i satisfies 
the LDP in X with scale k and the coercive rate function 

c(x, •) : X — >■ [0, oo] 

where c : X 2 — y [0, oo] is a lower semicontinuous function. 

Then, for any /i G P(X) we have: T- lim^oo Cq{ Mo = Cq° in P(X 2 ). 

Let us define the functions 



and 



r*(^,i/) := inf|-i/(7r|p^);7rGP(A' 2 ):7ro = /io,7r 1 = z/ 
= inf {C&(P); 7T G P(^ 2 ) : TTi = i/}, i/ G P(X) 



Tqi{ij,q,v) := inf |y edit; tt G P(A' 2 ) : ttq = po, ~K\ = u\ 

= inf{Cg?(7r); vr G P(^ 2 ) : vn = i/}, v G P(#). 
The subsequent result follows easily from Proposition 13.11 

Corollary 3.2. Under the assumptions of Proposition [XT], /or any po G P(A?) we have 

T- lim T£(/iQ, ■) = T i(/i , •) 

on P(X). In particular, for any p,\ G P{X), there exists a sequence (p k )k>i such that 
lim^oo p\ = p x inP(X) and lim*^ T^(p , p][) = T 01 (/i , p x ) G [0, oo]. 

Now, let us consider a sequence of minimization problems which is a generalization of 



(«So! ) fc>1 at Section [21 It is given for each k > 1, by 



-#(7r|p fc ' w ) -> min; tt G P(;f 2 ) : 7r = p , 7Ti = //* (Sg 



k 



oi - 



where (p\)k>i is a seq uence in P(X) as in Corollary 13.21 The Monge-Kantorovich problem 
associated with (Soi) fe>1 is 

/ cdir — > min; tt G P(A' 2 ) : tt = /io,v"i = /ii. (MK) 
Jx 2 

The main result of the paper is the following theorem. 

Theorem 3.3. Under the assumptions of Proposition HOI for any po,Pi G P{X) we have 
lim^oo inf g} = inf (JMKD G [0, oo]. 

Suppose that in addition inf (IMKj) < oo, £/&en for each large enough k, (S^) admits a 
unique solution it k G P(A' 2 ). 

Moreover, any limit point of the sequence (T~f k )k>i in P(X 2 ) is a solution to (jMKj) . In 
particular, if (IMKp admits a unique solution tt G P(Af 2 ), t/ien limfc-^oo = tt in P(X 2 ). 
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Remark that lim^oo inf ( Sqi ) = inf ( \MK\i is a restatement of lim^oo Tq^/iq, fi k ) = 

^01(^0,^1) in Corollary O 

Proposition 13.11 and Theorem 13.31 admit a dynamic version. 

Dynamical version. For each integer k > 1, we take a measurable kernel 

(R k ' x G P(fi s );a: G X) 
of probability measures on Q, with 

fi* : = {X = x}. 

We have in mind the situation where R k G P(fi) is the law of stochastic process and 
R k ' x = R k (- I X = x) is its conditional law knowing that X Q = x, see (j7|). For any 
jt/o G P(^), denote 

flM>(.) ;= f ^(dx) G P(O), PjfO := / Btf(-)/M>(dx) G P(* 2 ). 

J X J X 

We see that R k ^° is the law of a stochastic process with initial law fio and its dynamics 
determined by (R k,x ; x G X) where x must be interpreted as an initial position, while 
P01" — (Xq, Xi)#R k ' tJ, ° is the joint law of the initial and final positions under P fc,At °. Let 
us define the functions 

C k ^(P) := ±-H(P\R k ^) + l {Po =, o} , k > 1, C"°(P) := / C alP + l {Po ^ o} , P G P(fi) 
where C : f2 — )■ [0, 00] is a lower semicontinuous function. 

Proposition 3.4. We assume that for each x G X, the sequence (R k,x )k>i satisfies the 
LDP in Q with scale k and the coercive rate function 

C x = C + i {Xo=x y.Q^[0,oo} 

where C : Q — > [0, 00] is a lower semicontinuous function. 

Then, for any /x G P(X) we have: T-lim^oo C fc ' M = C° in P(O). 

Let us define the functions 

T fc (/i ,z/) := inf S^-H{P\R k ^);Pe P(fi) : P = /i , Pi = ^} 
= inf{C fc ' M0 (P); P G P(fi) : P x = z/}, 1/ G P(#) 

and 

T(/x 0> f) := inf|^L7rfP;PGP(fi):P = Mo,Pi = ^} 
= inf{C w (P); P G P(O) : Pi = u}, u G P(Af). 
Corollary 3.5. Under the assumptions of Proposition \3J\ we have 

r- lim T fc (/i ,-)=T(/io,-) 

on P(A'). In particular, for any ^1 G P{X), there exists a sequence (fi k )k>i such that 

lim fi k = fii 

k—too 

mP(X) and \im k ^ oo T k (ii , ii^) = T(// ,/ii) G [0, 00]. 
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Now, let us consider a sequence of minimization problems which is a generalization of 



(k22j) fc>1 at Section H It is given for each k > 1, by 



H{P\R k ^°) -» min; P G P(fi) : P = Mo, Pi 



A- 



Mi 



(S* 



where (/4%>i is a sequence in P(X) as in Corollary l3.5l The dynamic Monge-Kantorovich 
problem associated with ( |S_)) fc>1 is 



CdP min; P G P(fi) : P = Mo, Pi = Mi- 



(MK dyn ) 



Indeed, next result states that dS3 fe>1 tends to (MK dyn ) in the sense that not only the 
values of (jS A: l) /u>1 tend to inf (MKd yn ), but also the minimizers of (1S fc jl / . >1 tend to some 



minimizers of the limiting problem (MKd yn ) 



Theorem 3.6. Under the assumptions of Proposition \3Jj\ for any //o, Mi ^ P(^) we have 
lim^oo inf Q = inf QMK dyn p G [0, oo]. 



Suppose that in addition inf (MK dyn ) < oo, then for each large enough k, (IS j) admits a 
unique solution P k G P(Q). 



Moreover, any limit point of the sequence (P k )k>i in P(^) is a solution to (MKd yn )- In 
particular, if (MKd yn ) admits a unique solution P G P(O), then limbec P = P in P(f2). 



From the dynamic to a static version. Once we have the dynamic results, the static ones 
can be derived by means of the continuous mapping P G P(fi) i-> (X ,Xl)#P = P i G 
P(X 2 ). The LD tool which is behind this transfer is the contraction principle which is 
recalled at Theorem IA.2I below. The connection between the dynamic cost C and the 
static cost c is 



(17) 

(G 3 *) 



c(x, y) := inf{C(a;); to G fl : a>o = x, lo\ — y} G [0, oo], x,y E X 2 . 

This identity is connected to the geodesic problem: 

C{yS) min; a; G : w = x, LO\ = y. 

Since C x is coercive for all x £ X, there exists at least one solution to this problem, called 
a geodesic path, provided that its value c(x, y) is finite. 

The above static results hold true for any [0, oo]-valued function c satisfying the as- 
sumptions of Proposition 13.11 even if it is not derived from a dynamic rate function C 
via the identity ( TT71) . Note also that the coerciveness of C x for all x G X, implies that 
y (z X i — y c(x, y) is coercive (the sublevel sets of c(x, •) are continuous projections of the 
sublevel sets of C x which are assumed to be compact). Nevertheless, it is not clear at 
first sight that c is jointly (on X 2 ) measurable. Next result tells us that it is jointly lower 
semicont inuous . 

The coerciveness of C x also guarantees that the set of all geodesic paths from x to y : 

Y xy := {uj G Q;u = x, u)\ = y, C(u) = c(x, y)} 

is a compact subset of Q which is nonempty as soon as c(x, y) < oo. In particular, it is a 
Borel measurable subset. 



Theorem 3.7. Suppose that the assumptions of Proposition 3.4 are satisfied 



(1) Then, not only the dynamic results Corollary \3.5\ and Theorem \3.6\ are satisfied 
with the cost function C, but also the static results Proposition \3.1\ Corollary \3.S\ 
and Theorem \3.3\ hold with the cost function c which is derived from C by means of 
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(fT7|) . It is also true that c is lower semicontinuous and inf (MK dyn ) = inf (IMKp G 
[0,oo]. 

Suppose in addition that /^o>A*i £ P(^) satisfy inf (IMKp := T i(/^o,/^i) < oo, so that both 
(IMKp and (MKd yn ) admit a solution. 

(2) Then, for all large enough k > 1, (S^) and (IS jl admit respectively a unique 
solution ir k G P(<Y 2 ) and P fe G P(fi). Furthermore, 



P k = R 



x 1 



R k ' xy {-)TT k {dxdy) 



which means that P k is the ^-mixture of the bridges R k ' xy of R k . 
(3) The sets of solutions to flMKl) and QMK dyn p are nonempty convex compact subsets 
ofP(X 2 ) andP(Q) respectively. 

A probability P G P(fi) is a solution to (MKd yn ) if and only if Pox is a solution to 
(EED and 

P*y(Y*v) = i ; V (x,y) G P 01 -a.e. (18) 

In particular, if ( IMKp admits a unique solutions G P(A" 2 ) and for almost every 
(x, G Af 2 , tae geodesic problem (IG^I) admits a unique solution "f xy G f2. Tnen, 
(IMKdynP admits the unique solution 



P 



#2 



5 7 ^ H n(dxdy) G P(fi) 



which is the 7? '-mixture of the Dirac measures at the geodesies ^ xy and 



lim P fc 

fc— >-oo 



p 



zn P(fi). 
Remarks 3.8. 

(1) Formula ( !T8|) simply means that P only charges geodesic paths. But we didn't 
write P(r) = 1 since it is not clear that the set Y := {J xy( z X P xy of all geodesic 
paths is measurable. 

(2) In case of uniqueness as in the last statement of this theorem, the marginal flow 
of P is 



/it := P 



A' 2 



5~*v7r(dxdy) G P(X), te[0,l]. 



It is the displacement interpolation between /x and fix. 

As a consequence of the abstract disintegration result of the probability measures 
on a polish space, the kernel (x, y) i— > <La» is measurable. This also means that 
(x, y) i — y 7 xy is measurable. 

(3) If no uniqueness requirement is verified, then 

fi t = (X t ) # PeP(X), te[o,i] 

is also a good candidate for being called a displacement interpolation between /x 
and fix- 

(4) The problem of knowing if (7r fc )£;>i converges even if (IMKp admits several solutions 
is left open in this article. It might be possible that this holds true and that the 
entropy minimization approximation selects a "viscosity solution" of (IMKp . 
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Back to Schrodinger's heat bath. We illustrate these general results by means of 
Example 12.11 A well-known LD result is about the large deviations of the Revalued 
process which we have already met at ( fl5l) and is defined by 

Y t k ' x = x + \fIJkB u < t < 1, (19) 

where the initial condition Y$' x = x is deterministic, B = (B t ) < t <i is the Wiener process 
on R d and we decided to take a 2 = 1/k. with k > 1 an integer. 

Theorem 3.9 (Schilder's theorem). The sequence of random processes {Y k ' x )k>i satisfies 
the LDP in Q = C([0, 1], R ) equipped with the topology of uniform convergence with scale 
k and rate function 

C x (u)= I l^dte [0,oo], ueQ 



J [0,1] z 

if ujq = x and uj is an absolutely continuous path ( its derivative is denoted by u) and 
C x {oj) = oo, otherwise. 

For a proof, see [DZ981 Thm 5.2.3]. 
With our notation, this corresponds to 

c{u) = f J m dt G [0, oo] ifa;Gfi ac ^ ^ £ Q 
i oo otherwise 

where f2 ac is the space of all absolutely continuous paths u : [0, 1] — > M. d . By Jensen's 
inequality, (fTT|) leads us to 

c(x,y) = \y-x\ 2 /2, x,y eR d 

which is the well-known quadratic transport cost. Let R k,x G P(fi) denote the law of Y h,x . 
Then R k >^(.) = J Rd R k >*(.) ^(dx) G P(fi) is the law of 

Y t k = Y + y/TJkB t , < t < 1, 

with initial law: Law(r ) = Mo G P(R d ). Also denote p fc ' w = (X , X 1 ) #J R fc ' w> G P(R d xR d ), 
i.e. 

p k ^°(dxdy) = fio(dx)(27T/k)- d/2 exp ( - V V ' ~^ ) dy. 



The above results tell us that if there exists some n* G P(R d x M. d ) such that Hq = /x , 
7r^ = fii and J R d xR d |y — x\ 2 n*(dxdy) < oo, then T i(/io, Mi) < oo, there exists a sequence 
(Mi)a;>i such that lim^oo fi k = /ii in P(R d ) and for any large enough k > 1, the entropy 
minimization problem 

iiJ(7r|p fc ' w ) min; vr G P(R d x R d ), vr = u , tti = lA 
k 

admits a unique solution n k G P(R d x R d ), the sequence (^ fc )fc>i admits at least a limit 
point in P(R d x R d ) and any such limit point is a solution of the Monge-Kantorovich 
quadratic transport problem 

f I u — X I ^ 

/ — it(dxdy) -> min; 7r G P(R d x R d ), 7r = fi , it 1 = /i x . 

JR d xR d 2 

In addition, we have lim^oo (7r fc |p fc ' W) ) = T i(/i ,Mi)- 

Moreover, for all large enough k > 1, the corresponding dynamic problem 

I#(P|i*M>) min; P G P(fi), P = Mo, Pi = Mi 
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has a unique solution P k E P(^) which is given by 

JR d xR d 

the sequence (P k )k>i admits at least a limit point in P(f2) and any such limit point is a 
solution of the dynamic Monge-Kantorovich quadratic transport problem 

P(duj) ->• min; P E P(fi ac ), Pa = A*o, P\ = Hi- 



1 






J [0,1] 1 



In the case where /io or fii is absolutely continuous with respect to the Lebesgue measure 
on R d , it is well known |Bre91llMcC95] that f lMKj) admits a unique solution n. By Theorem 
13.71 we obtain limfc^ 00 P fc = P in P(fi) where 

P{-)= / ^(K) 1+ i S ](^W)ePp) 

and the corresponding displacement interpolation is the marginal flow of P which is 
IH{-) = P t (-) = [ S(i-t)W-) n{dxdy) E P(R d ), t E [0, 1]. 

JR d xR d 

Moreover, the marginal flow of P k is 

rf(0 = Pt(-) = [ Rt' Xy (-) * k {dxdy) E P(R d ), t E [0, 1] 

JR d xR d 

and for each t E [0,1], limfc-j.oo/i^ = ji t in P(M rf ). 

Mikami's paper |Mik04] is in the context of Schrodinger's heat bath based on the Wiener 
process as above. Although no relative entropy nor T-convergence enter the statements 
of |Mik04] 's results, some of the previous results about Schrodinger's heat bath are close 
to the main results of |Mik04] which are proved by means of cyclical monotonicity with 
a stochastic optimal control point of view. Theorems 13. 3} 13.6} 13.71 apply to a large class 
of optimal transport costs, see Section [H They shed a new light on [Mik04j 's results and 
extend them in the sense that the reference process R is not restricted to be the Wiener 
process and the LD principle which is satisfied by (R k )t>i is not restricted to the setting 
of Schilder's theorem. 

4. From stochastic processes to transport cost functions 

We have just seen that Schilder's theorem leads to the quadratic cost function. The 
aim of this section is to present a series of examples of LD sequences (R k )k>i in P(^) 
which give rise to cost functions c on X 2 . 

Simple random walks on R d . Instead of 019p . let us consider 

Y t k ' x = X + W t k , < t < 1, (20) 

where for each k > 1, W k is a random walk. The law of Y k,x is our R k,x E P(fl). 

To build these random walks, one needs a sequence of independent copies (Z m ) m >i of a 
random variable Z in R d . For each integer k > 1, W k is the rescaled random walk defined 
for all < t < 1, by 

[kt] 

w t=iY. z > ( 21 ) 
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where \_kt\ is the integer part of kt. This sequence satisfies a LDP which is given by 
Mogulskii's theorem. As a pretext to set some notations, we recall its statement. The 
logarithm of the Laplace transform of the law rriz G P(M d ) of Z is log f Rd e^' z mz(dz). Its 
convex conjugate is 

c z (v) := sup | C • v - log / e c "* m z (dz) \ , u6R d (22) 



One can prove, see [DZ98] . that cz is a convex [0, oo]- valued function which attains its 
minimum value at v = ~EZ = J Rd zmz(dz). Moreover, the closure of its effective domain 
cl{cz < oo} is the closed convex hull of the topological support suppm^ of the probability 
measure mz- Under the assumption fl23|) below, it is also strictly convex. 
For each initial value x G M. d , we define the action functional 

C%(co) := { ^ Cz ^ t] dt if " G fiac and Uo = x , co en. 

\ +oo otherwise 
Theorem 4.1 (Mogulskii's theorem). Under the assumption 

e c - z m z (dz) < +oo, V( G R d , (23) 



for each x G M. d the sequence (R k ' x )k>i of the laws of (Y k ' x ) k >i specified by f[2"Uj) satisfies 
the LDP in ft = D([0, 1], M. d ), equipped with its natural a-field and the topology of uniform 
convergence, with scale k and the coercive rate function Cf . 

For a proof see |DZ98l Thm 5.1.2]. This result corresponds to our general setting with 

C(u) = C z {u) := ( -W] Cz ^ dt if " G , u; G a (24) 
+oo otherwise 



Since is a strictly convex function, the geodesic problem f ]G xy j) admits as unique solution 

the constant velocity geodesic 

a xy : t G [0, 1] i-> (1 -t)x + ty G M d . (25) 
Now, let us only consider the final position 

Denote = (Xi)#R k,x G P(^f) the law of Y^'^. By the contraction principle, see 
Theorem IA.2I at the Appendix, one deduces immediately from Mogulskii's theorem the 
simplest result of LD theory which is the Cramer theorem. 

Corollary 4.2 (A complicated version of Cramer's theorem). Under the assumption f j23|) . 
for each x G M. d the sequence (p ,x )k>i of the laws of (Y l ,x )k>i satisfies the LDP in M. d 
with scale k and the coercive rate function 

y G X ^c z (y-x) G [0,oo] y G X 

where cz is given at (|2*2"]). 

Furthermore, c z (v) = inf {C^(w); to G Q : u = x, U\ = x + v} for all x, v G M. d . 

Last identity is a simple consequence of Jensen's inequality which also lead us to ( 1251) 
a few lines earlier. Cramer's theorem corresponds to the case when x = and only the 
deviations of Y 1 '° = | Ylj=i Zj in K d are considered. 
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Theorem 4.3 (Cramer's theorem). Under the assumption fl23|) . the sequence (| Ylj=i Zj)k>i 
satisfies the LDP in R d with scale k and the coercive rate function cz given at ( 1221) . 



For a proof, see |DZ98l Thm 2.2.30]. 

We have just described a general procedure which converts the law niz G P(R d ) into 
the cost functions Cz and cz- Here are some examples with explicit computations. 

Examples 4.4. We recall some well-known examples of Cramer transform cz- 

(1) To obtain the quadratic cost function cz(v) = \v\ 2 /2, choose Z as a standard 
normal random vector in R d : mz{dz) = (2n)~ d ^ 2 exp(— \z\ 2 /2) dz. 

(2) Taking Z such that F(Z = +1) = F(Z = -1) = 1/2, i.e. m z = (5^ + 5 +1 )/2 leads 
to 

f [(l + v)log(l + i;) + (l-t;)log(l-t;)]/2, if - 1< v < +1 
cz{v)={ log 2, if f G { — 1, +1} 

[ +oo, if v g" [-1,+1]. 

(3) If Z has an exponential law with expectation 1, i.e. mz(dz) = l{ z >oye~ z dz, then 
c z( v ) = v — 1 — logv if v > and cziy) = +oo if v < 0. 

(4) If Z has a Poisson law with expectation 1, i.e. rriz{dz) = e _1 X] n >o "i^nO^)) then 
Cz(i>) = v logv — v + 1 if v > 0, cz(0) = 1 and c^(u) = +oo if t> < 0. 

We have c^(0) = if and only if MZ := J Rd zmz{dz) = 0. More generally, cz(v) G [0, +oo] 
and Cz(v) = if and only if v = EZ. We also have 

c a z+b{u) = c z {aT x {v - b)) 

for all invertible linear operator and all & G R . 

If EZ = 0, is quadratic at the origin since cz(v) = v ■ T~ z 1 v/2 + o(\v | 2 ) where Y z is 
the covariance matrix of Z. This rules out the usual costs c(v) = \v\ p with p ^ 2. 

Nevertheless, taking Z a real valued variable with density Cexp(— \z\ p /p) with p > 1 
leads to c^(f ) = \v | p /p(l + 0|^|_^ oo (l)). The case p = 1 follows from Example 14.41 (3) above. 
To see that the result still holds with p > 1, compute by means of the Laplace method 
the principal part as ( tends to infinity of J °° e~ zP / p e^ z dz = ^2-K^q — l)(, 1 ~ q l 2 e c - q l q (\ + 
o ? _> +00 (l)) where l/p+ l/q=l. 

Of course, we deduce a related d- dimensional result considering Z with the density 
C exp(-\z\P/p) where \z\ p p = Yji<d N P - This S ives c z( v ) = \ v \p/p( l + OH->oo(l))- 

Remark 4.5. Let i? fc be defined by ( 1201) and ( 12~TT) where Z is only allowed to take isolated 
values as Examples 14.41 (2) and (4). Suppose that /i has a discrete support, then R X ,IM3 
has also a discrete support. It follows that any P G P(^) which is absolutely continuous 
with respect to .R^ is such that Pi has a discrete support. Now, if you choose a diffuse 
measure for fix, there is no solution to the non- modified minimization problem ($Z§. We 
see that it is necessary to introduce a sequence (fii)k>i of discrete measures su ch t hat 



limfc^oo/i^ = fj,i for the sequences of entropy minimization modified problems (S^lfc^ 



and ( IS jl fr^i to admit solutions. 



Nonlinear transformations. By means of the contraction principle (Theorem IA.2j) . we 
can twist the cost functions which have been obtained earlier. We only present some 
examples to illustrate this technique. 

The static case. Here, we only consider the LD of the final position Yf. We have just 
remarked that the cost functions cz as above are necessarily quadratic at the origin. This 
drawback will be partly overcome by means of continuous transformations. 
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We are going to look at an example 

Y k ' x = x + V k 

where (V k )k>i satisfies a LDP which is not given by Cramer's theorem. Let (Zj)j>i be 
as above and let a be any continuous mapping on M. d . Consider 



V k = a 



k 

l<j<k 



We obtain c(v) = mi{cz(u); u G M. d , a (it) = v}, v G M. d as a consequence of the contraction 
principle. In particular if a is a continuous injective mapping, then 

c = cz o oT 1 . (26) 

For instance, if Z is a standard normal vector as in Example 14.41 (1). we know that the 

empirical mean of independent copies of Z : iJ2i<j<kZj, is a centered normal vector 

with variance Id/A;. Taking a = a p which is given for each p > and v G M. d by a p (t>) = 
2 -i/ P | u |2/p-i U) leadg ug tQ 

L = w (2ky 1/p \Z\ 2/p - 1 Z, (27) 

the equality in law L = v simply means that both sides of the equality share the same 
distribution. The mapping a p has been chosen to obtain with fl26l) : 



c(v) := Cp(v) = \v\' p , v G 



ad 



Note that K fc has the same law as k X I' P Z V where the density of the law of 
for some normalizing constant K. 

The dynamic case. We now look at an example where 

Y t k ' x = X + V t k , < t < 1 (28) 

where (V^^)^^! satisfies a LDP in Q which is not given by Mogulskii's theorem. 

We present examples of dynamics V k based on the standard Brownian motion B = 
(£> t )o<i<i m 111 these examples, one can restrict the path space to be the space Q = 
C([0, l],M d ) equipped with the uniform topology. The item (1) is already known to us, 
we recall it for the comfort of the reader. 

Examples 4.6. 

(1) An important example is given by 

V k = k" 1/2 B t , < t < 1. 

Schilder's theorem states that (V k )k>i satisfies the LDP in Q with the coercive 
rate function 



C>) 



Jq 1 \u t | 2 /2 dt if u G fi ac , w = 
+oo otherwise. 



As in Example 14.41 (1). it corresponds to the quadratic cost function |f| 2 /2, but 
with a different dynamics. 
(2) More generally, with p > 0, we have just seen that 

V k = {2k)- 1/p \B t \ 2/p - 1 B t , < t < 1 
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corresponds to the power cost function c p (v) = \v\ p , v G M. d , since V* L = v V k as in 
(|27j) . The associated dynamic cost is given for all u G by 

p 2 / 4 J[ ,i] |w t | p ~ 2 |w t | 2 dt if w G fiac, w = 
+00 otherwise. 



(3) Similarly, with p > 0, the dynamics 

7 t * = (2A;)- 1/p |5 t /t| 2/p - 1 J B t , < t < 1 

also corresponds to the power cost function c p (v) = \v\ p , v G M. d , since 

yk l^w ^fc 

as in f l2~T|) . But, this time the associated dynamic cost is given for all u G fl by 

2 



C7°(o;) = J 1/(0,1] 1 {^Q}\ U} t/t\ p (2 -p)uk/\u t \ +ptu t /\u t \ 



dt if W G fi aC , Wo 



+00 otherwise. 

Recall that a geodesic path from x to ?/ is some u G f2 a c which solves the minimization 
problem ([Gf£]). It is well known that the geodesic paths for Item (1) are the constant 
velocity paths a xy , see fl25|) . The geodesic paths for Item (2) are still straight lines but 
with a time dependent velocity (except for p = 2). On the other hand, the geodesic paths 
for Item (3) are the constant velocity paths. 

Modified random walks on M. d . Simple random walks correspond to ( 128]) with V = 
W k given by ( 12T|) . We introduce a generalization which is defined by ( 128]) with 

V t k = a t (W k ), < t < 1 

where a : (t, v) G [0, ljxR^ oc t (v) G R d is a continuous application such that a (0) = 
(remark that Wq = almost surely) and a t is injective for all < t < 1. 
For all x G ffi d and all k > 1, the random path y fc,:E = x + V k satisfies 

Y k ' x = §(W k ' x ) 

where W k,x = x + W k and $ : — > Q is the bicontinuous injective mapping given for all 
wGfiby $(w) = ($ t (w)) <t<i where 

$ t (o;) = w + a t {u t - w ), < t < 1. 

As for (J21)]) . the LD rate function of (Y k ' x ) k >i is C x = C + L{x =x} where 

C = C z o $ _1 

and Cz is given at (|2"3|) . It is easy to see that for all ^ 6 fi, $ _1 (0) = ($7 1 (0))o<t<i 
where for all < t < 1, = 0o + ~~ 0o) with /3 t := aj l . Assuming that (3 is 

differentiable on (0, 1] x M. d , we obtain 

C(u) = ( /flU] Cz ( dt ^ Ut ~~ + ~~ w o) ■ w t ) dt ifc;Gfi ac w G ^ 

\ +00 otherwise ' 

For each x, yffi d , ( \G xy h admits a unique solution ^ xy which is given by the equation 
^~ 1 (ry xy ) = a x ' x+ ^ y ~ x ' where a xy is the constant velocity geodesic, see (125]) . That is 

j xy = x + a t (0 1 (y-x)), 0<t<l. 

The corresponding static cost function c which is specified by (IT?]) , i.e. 

c(x,2/) = C(r e1 '), x,yeR d . 
In the case when a doesn't depend on t, we see that for all x, y G M. d , 
c(x,y) = C[rT) = C z (a x ' x+P{y - X) ) = c z (a\y - x)), 



2:5 



which is fl26|) . but the velocity of the geodesic path 

<y* y = \7a(ta~ l (y - x)) ■ aT 1 {y - x) 

is not constant in general. 

5. Proofs of the results of Section [3] 
The main technical result is Proposition 13.41 

It will be used at several places that X ,Xi : Q — > X are continuous. This is clear 
when Q = C([0, 1], X) since it is furnished with the topology of uniform convergence. In 
the general case where Q = D([0,1], X) is furnished with the Skorokhod topology, it is 
known that X t is not continuous in general. But, it remains true that Xq and X\ are 
continuous, due to the specific form of the metric at the endpoints. 

Proof of Proposition 13.41 The space 0,(0) is furnished with the supremum norm 
ll/H = sup n |/|, / G Cb(fi) and C b (VL)' is its topological dual space. Let M 6 (fi), resp. 
Mjj"(Q) denote the spaces of all bounded, resp. bounded positive, Borel measures on Q. 
Of course, M 6 (fi) C C b (Q)' with the identification {f,Q)c b (n),c b (ny = J^fdQ for any 
Q G M b (Q). We write (f,Q) := (f,Q)c b (n),c b (ny for simplicity. 

Dropping the superscript k for a moment, we have (R x G P(Q);x G X) a measurable 
kernel and := j x R x (-) ^(dx) where /xq G P(^) is the initial law. 

Lemma 5.1. For all Q G C b (Q)', 

H(Q\R^) + L {QeP(ny . Qo=w} = sup (</,Q>- f \og(e f ,R x )fi (dx)). 

fec b (U) L J x ) 

This identity should be compared with the well-known variational representation of the 
relative entropy 

H(Q\R)+l p(q) (Q)= sup {(/,Q)-log(e / , J R)}, Q G M b (Q) (29) 

fec b (n) 

which holds for any reference probability measure R G P(fi) on any polish space Q. 
Proof. Denote 

9(/) = / log(e^, R x ) fio(dx) G (-oo, oo], / G C b (Q) 
Jx 

Its convex conjugates with respect to the duality (C b (fl), C&(fl)') is given for all Q G 
C b (n)' by 6*(Q) := sup /eCij(n) {(/,Q) -©(/)}■ It will be proved at Lemma El that 
any Q G C b (Q)' such that Q*(Q) < oo is in M^(ft). Let us admit this for a while, and 
take Q G M+(fi) such that Q*(Q) < oo. Taking / = <j>{X Q ) with 4> G C 6 (Af), we see that 
su P^eC 6 m Xf 'fidiQo — /^o) < Hence, 0*(Q) < oo implies that Q = A*o- This shows 

us that if Q*(Q) < oo, then Q is a probability measure with Q = /x . 
It remains to prove that for such a Q G P(fi), we have Q*(Q) = H{Q\R 110 ). Since Q is a 
polish space, any Q G P(fi) such that Qo = /io disintegrates as 

Q(-)= I Q x (-)»o(dx) 
Jx 

where (Q x ; x G X) is a measurable kernel of probability measures. We see that 

9*(Q) = sup / [(/, Q x ) - log(ef, R x )} ^{dx). 
fec b (U) Jx 
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We obtain 

6*(Q) < / sup [(f,Q x )-log(e f ,R*)}no(dx) 
Jx fec b (n) 

= [ H(Q x \R x )^ (dx) 
Jx 

= H{Q\R*>) 

where (1291) is used at the marked equality and last equality follows from the tensorization 
property (fTUjl . Note that x i-> H(Q X \R X ) is measurable. Indeed, (Q,R) i-> H(Q\R) is 
lower semicontinuous being the supremum of continuous functions, see (129|) . Hence, it is 
Borel measurable. On the other hand, x i— >■ i? x and x i— )■ Q x are also measurable, being 
the disintegration kernels of Borel measures on a polish space. 

Let us prove the converse inequality. By Jensen's inequality: f x log(e^, R x ) (J>o(dx) < 
log f x {e f , R x ) fM (dx) = log(e / , R»°), so that 

e*(Q) > sup j / /dQ -log / e^/^ } = H{Q\R^°) 

where the equality is fl2U|) again. This completes the proof of the lemma. □ 

During the proof of Lemma loTTl we used a result which is stated at Lemma Iq~21 Denote 
also 

A(/) := / sup{/ - C x } fxo(dx) = [ sup{/ - C} fi (dx), f G C 6 (fi) 
Jx n Jx £i x 

where Q x := {X = x} C Q. It will appear later that the function A is the convex 

conjugate of the T-limit C. Its convex conjugate with respect to the duality (C&(f2), G^iVL)') 

is given for all Q G C b (Q)' by A*(Q) := sup /eCfc(n) {(/, Q) - A(/)} . 

Lemma 5.2. 

(%) {0*<oo}cM+(fi); 
^g; {A* < cx)} C M+(fi). 

Proof. For a positive element Q G Cb(fi)' to be in M&(f2), it necessary and sufficient 
that it is cr-additive. That is, for all decreasing sequence (f n )n>i in C&(fi) such that 
lim^oo f n = pointwise, we have lim n _ >0O (/ n , Q) = 0. 

• Proof of (1). Let us prove that {6* < oo} C M+(fi). 

Let us show that Q > if 0*(Q) < oo. Let / G C 6 (fi) be such that / > 0. As Q(af) < 
for all a < 0, 

9*(Q) > sup{a(/,Q)-6(a/)} 

a<0 

> sup{a(/,Q)} 

a<0 

r o, if (/,q)>o 

[ +oo, otherwise. 

Therefore, if Q*(Q) < oo, (f, Q) > for all / > 0, which is the desired result. 

Let us take a decreasing sequence (f n )n>i i* 1 Q>(^) which converges pointwise to zero. 
By the dominated convergence theorem, we have 

lim Q(af n ) = 0, Va > 0. 
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It follows that for all Q G C 6 (ft)', 

Q*{Q) > suplimsup{a(/„,Q) - 6(a/ n )} 

a>0 n— >oo 

> sup I limsupa(/ n ,Q) - lim B(af n ] 
= sup a lim sup Q) 

a>0 n— >oo 

if limsup n ^ 00 (/ n ,Q) < 
+oo otherwise. 

Therefore, if Q*(Q) < oo, we have limsup n ^ 00 (/ n , Q) < 0. Since we have just seen that 
Q > 0, we have the desired result. 

• Proof of (2). Let us prove that {A* < oo} C M^(Q). 

Let us show that Q > if A*(Q) < oo. Let / G C 6 (fi) be such that / > 0. As inf C = 0, 
A(a/) < for all a < 0, and we conclude as at item (1). 

Let us take a decreasing sequence (f n )n>i hi C&(fi) which converges pointwise to zero. 
By Lemma below, for all x G X, (sup n {/ n — C a: }) n>1 is a decreasing sequence and 
lim^oo sup Q {/„ — C x } = 0. As |sup n {/„, — C x }\ < sup n \ fi\ < oo for all n and x, we 
can apply the dominated convergence theorem to obtain that lim^oo A(af n ) = 0, for all 
a > and we conclude as at item (1). 

Finally, one must be careful with the measurability of x G X i— > u n (x) := inf^jC 1 ' — 
f n } = — sup Q {/ n — C x } G M. Since Q and X are assumed to be polish, we can apply a 
general result by Beiglbock and Schachermayer |BS091 Lemmas 3.7, 3.8] which tells us 
that for each n > 1 and each Borel probability measure fi on X, there exists a Borel 
measurable function u n on such that u n < u n and w n (^) = u n (x) for /x-a.e. a; G X. □ 

During the proof of the previous lemma we have invoked the following result. 

Lemma 5.3. Let J be a coercive [0,oc]-valued function on Q and (f n )n>i o, decreasing 
sequence of continuous bounded functions onfl which converges pointwise to some bounded 
upper semicontinuous function f. Then, (sup n {/„ — </}) n>1 is a decreasing sequence and 

lim sup{/ n - J} = sup{/ - J}. 

Proof. Changing sign and denoting g n = J — f n , g = J — f, we want to prove that 
linv^oo inf n g n = inf Q g. 

We see that (g n )n>i is an increasing sequence of lower semicontinuous functions. It 
follows by the Proposition 5.4 of |Mas93j that it is a T-convergent sequence and 

T- lim g n = lim g n = g. (30) 

n— too n—yoo 

Let us admit for a while that there exists some compact set K which satisfies 

inf g n = inf g n (31) 

fi K 

for all n. This and the convergence (I3"U|) allow to apply Theorem 7.4 of |Mas93j to obtain 
inf n g n = infn T- lirn n ->oc 9n = inf n 9 which is the desired result. 

It remains to check that (I3~TT) is true. Let G Q be such that J(w*) < oo (if J = +oo, 
there is nothing to prove). Then, ini n g n < g n {u*) = J(w*) - f n (u*) < J{w*) - f(u*) < 
J(tu*) — inf n f. On the other hand, for all n, f n < f\ < A := sup fi. Let B := A + 1 + 
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J (to*) — infn /. For all u such that J (to) > B, we have g n (to) > B — sup n f n >B — A> 
J (a;*) — iniQ / + 1. We have just seen that for all n, 

inf g n < J(to*) — inf / and inf g n (w) > J(to*) — inf / + 1. 

n n u);j(u)>B n 

This proves (13 ip with the compact level set X = { J < S} and completes the proof of the 
lemma. □ 



Recall that for all Q e M b (Q), C k ^°(Q) = \H(Q\R k ^°) + i {Qo=llo }. With Lemma EH 
we see that 

C k ^(Q)=A* k (Q), QeC b (n)' (32) 
where A£ is the convex conjugate of 

A*(/) = J I log(e fc/ , R k > x ) fM (dx), f e C b (Q) 

with respect to the duality (C b (fl), G b (ft)'). The keystone of the proof of Proposition 13.41 
is the following consequence of the Laplace- Varadhan principle. 

Lemma 5.4. Under the assumptions of Proposition \3J\ for all f £ C b (Q), we have 

(1) lim^ 00 A & (/) = A(/); 

(2) sup^ \A k (f)\ < H/ll, |A(/)| < ll/H := sup,, |/|. 
The functions A& and A are convex. 

Proof. Our assumptions allow us to apply the Laplace- Varadhan principle, see Theorem 
IA.3I It tells us that for each x G X, 

lim \ log(e fc/ , R k ' x ) = sup{/ - C x }. 

On the other hand, it is clear that for each k > 1, || log(e fc ^, < ||/||. Passing to the 

limit, we also get | sup^{/ — C x }\ < \\f\\. Now by the Lebesgue dominated convergence 
theorem, we obtain the statements (1) and (2). 

Note that x h- >■ sup Q {/ — C x } is measurable as a pointwise limit of measurable functions. 
It is standard to prove with Holder's inequality that / i— > -r log(e fc ^, R k,x ) is convex. It 
follows that Afc and A are also convex. □ 

We are in position to apply Corollary 16 .41 Let us equip C b (Q)' with the *-weak topology 
a(C b (Q)',C b (n)). By Corollary EM we have 



T- lim A* k = A* (33) 

k— >QO 



where 



A*(Q)= sup Uf,Q)cm,cm>- I svLp{f-C*}v (dx)\, Q e C b (tt)'. 
fec b (n) I Jx n ) 



This limit still holds in M b (fi) C C 6 (0)', by Lemma O 

Because of (1521 . (155|) and Lemma 15.21 to complete the proof of Proposition I3.4[ it 
remains to prove the subsequent lemma. 

Lemma 5.5. Let C be a lower semicontinuous [0, oo]-valued function on the polish space 
Q. Denote C x = C + L{e= x } for each x e X, where 9 : Q ^ X is a continuous application 
with its values in polish space X . Take // G P(X) and suppose that 

inf C x = 

n 
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for ^-almost every x G X. Then, we have 



sup \{f,Q)- [ suv{f-C x }pL(dx) 
fec b (U) L Jx n 

= / CdQ + i{Q6P(n)%Q=M} , Q6M 6 (fi). (34) 

Note that since C > and C is measurable, the integral J Q C dP makes sense in [0, oo] 
for any P G P(f2). 

As the function C of Proposition 13.41 is such that is a LD rate function for all x £ X, 
it satisfies the assumption inf^ C x = for //-almost every igA". 



Proof. Let us first check that if Q G M b (fi) satisfies A*(Q) < oo, then Q G P(O) and 
6> # Q = // G P(^). We already know by Lemma E21 that Q G M^(fi). Choosing / = <p o 9 
with G Cft(Af), since inf^ C x = 0, we see that sup n {0 o 9 — C x } = 4>(x). Hence, 
sxl P<l>ec b (x) fx ^diOftQ — /i) < < oo which implies that 9#Q = //. This proves 

the desired result. 

It remains to prove the equality for a fixed P G P(fi) which satisfies 9#P = //. Because 
Q and A* are polish spaces, we know that P disintegrates as follows: P(-) = f x P x (-) (J>(dx), 
with x G X i-)- P a: (-) := P(- | 6* = x) G P(f3) Borel measurable. For any / G C 6 (fi), 



(f,P)- [ su P {f - C x } f,(dx) = [ l(f,P x )-su V {f-C x }}ii(dx) 
Jx ft Jx n 

= f [(C x , P x ) + (f-C x - sup{/ - C*}, P»>] //(dx) 
< / P z ) //(dx) 
CdP. 



Optimizing, we obtain 



sup \{f,P}- [ sup{/ - C*} //(tfe) 1 < [ CdP. 
fec b (n) I Jx n J Jn 



If C is in C&(0), the case of equality is obtained with / = C, P-a.e. and in this situation 
we see that the identity (1341 is valid. This will be invoked very soon. 

In the general case, C is only assumed to be lower semicontinuous. By means of 
the Moreau-Yosida approximation procedure which is implementable since Q is a metric 
space, one can build an increasing sequence (C n ) n >i of functions in Q,(fi) which converges 
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pointwise to C. Therefore, 



sup \(f,P)- [ sup{f-C x }fi(dx)\ 
zC b (n) L Jx n ) 



/ec 5 (fi) 
< [ CdP 

i 

(ii) 



n 

sup / C n dP 

n>l Jn 



sup sup \(f,P)- / sup{/ - CZ}n(dx) > 



= sup <^ (/, P) + sup / inf{C£ - /} ^(dac) 
/eC b (C) L n>l S2 

< sup ((/,P)+ / inf {C* - /} ^(dx) 
/eCi,(n) I Jx n 

= sup {(f,P)- f sup{f - C x } fi(dx)\ , 
fec b (n) I Jx n ) 

which proves the desired identity (J34l) . 

Equality (i) follows from the monotone convergence theorem. Since C n stands in Cb(fi), 
equality (ii) is valid (this has been proved a few lines earlier) and the inequality (iii) is 
a direct consequence of C n < C for all n > 1. Note that x 6 A 1 h> inf^jC^ - /} 6 1 is 
upper semicontinuous and it is a fortiori Borel measurable. □ 

Proofs of the remaining results. The keystone of the proofs of the remaining results 
is Proposition 13.41 

Proposition \3.1\ Proposition 13.11 is a particular case of Proposition 13.41 Indeed, choosing 
Q = X 2 which can be interpreted as the space of all A'-valued paths on the two-point 
time interval {0, 1}, and taking C(u>) = c(u>q, wi) where c is assumed to be lower semicon- 
tinuous, with co = (x,y) we see that C x (x',y) = c(x,y) + L{ x >= x } for all x,x',y G X. The 
assumption that c(x, •) is coercive on X is equivalent to the coerciveness of C x on X 2 . 

Corollary \3. 51 and Theorem \3.6l With Proposition E31 in hand, Corollary 13 .51 and Theorem 
13.61 are immediate consequences of Theorem 17.11 and of the equi-coerciveness with respect 
to the *-weak topology a(P(fl), Cj(O)) of {C,C k ;k > 1}. This equi-coerciveness follows 



from Corollary 16.41 and Lemma 15.41 The uniqueness of the solution to (|S_J) follows from 
the strict convexity of the relative entropy. 

Corollary \3.S\ and Theorem \3.3[ Similarly, once we have Proposition 13.11 in hand, Corol- 
lary 13.21 and Theorem 13.31 are immediate consequences of Theorem 17.11 and of the equi- 
coerciveness with respect to the *-weak topology a(P(X), Cb(X)) of {C i, C^; k > 1}. This 
equi-coerciveness follows from the fact that the set of all probability measures ir e P(X 2 ) 
such that 7r = /io and tti G k > 1} is relatively compact since limfe.). 

consequence of Prokhorov's theorem in a polish space. 



Again, the uniqueness of the solution to ( Sqi ) follows from the strict convexity of the 
relative entropy. 

Note that, when C and c are linked by ( TT7|) . one can also derive the equi-coerciveness of 
{C i,e^;k > 1} from the equi-coerciveness of {C,C h ; k > 1}, as in the proof of Theorem 
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Theorem 3.1, The proof of Theorem 13.71 relies upon the subsequent lemma. 

Lemma 5.6. Under the assumptions of Proposition 3.4 , the function c defined by ( !T7l) is 
lower semicontinuous and 

inf|y CdP;Pe P(O),P i = tt| = J cd7re[0,oo], 

for alln E P(X 2 ). 

Proof. Let us define the function 

*(7r) :=inf j / CdP;PeP(Q) :P 01 =ir\, nEP(X 2 ). 



As C is assumed to be lower semicontinuous on Q, \1/ satisfies the Kantorovich type dual 
equality: 

^(7r) = sup/" fdir, ireP(X 2 ) (35) 

where T := {/ G Cf,(A' 2 ); f(Xo, X\) < C}. For a proof of fl35|) . one can rewrite mutatis 
mutandis the proof of the Kantorovich dual equality. See for instance |Leo[ Thm 3.2] and 
note that this result takes into account cost functions which may take infinite values as 
in the present case. 

This shows that \I/ is a lower semicontinuous function on P(X 2 ), being the supremum of 
continuous functions. Define the function 

il>(x,y) *0*(*,y))> x,y<EX. 

We deduce immediately from the lower semicontinuity of \I/ that i[) is lower semicontinuous 
on X 2 . Hence it is Borel measurable. Since it is [0, oo]-valued, the integral f^ipdn is 
meaningful for all 7r G P(X 2 ). We are going to prove that 

*(tt) = f ipdn, 7r G P{X 2 ). (36) 
Jx 2 

For any n G P(X 2 ), we obtain 

*(tt) = inf |y yj CdP xy ^j 7c(dxdy);P G P(O) 



inf y^CdP;Pe P(fi) : P i = } 7r(dxdj/) 



> 



■0 d7T. 

A" 2 

Let us show the converse inequality. With f)35p . we see that for each / G T and all 
G X 2 , ip(x,y) = ^{5( X! y)) > J x2 fdS {Xiy) = f{x,y). That is / < ij), for all / G 7. 
Therefore, ty(ir) = supj gJ - f x2 f dn < J x2 ipd7T, completing the proof of (136]) . 

It remains to establish that ip = c. With (|35|) . we get ip = sup J 7 . But it is clear that / G 
T if and only if for all x, y G X, f(x,y) < inf {C(cu); u G : uq = x, w\ = y} := c(x,y). 
Hence, ^ is the upper envelope of the set of all functions / G Cb(X 2 ) such that f < c. 
In other words ip is the lower semicontinuous envelope lsc of c. Finally, for all x, y G X, 
hc(x,y) = i/;(x,y) = inf {f n CdP xy ;P G P(fi)} > c(x,y) > \sc(x,y). This implies the 
desired result: ip = h c = c. □ 
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With this result at hand, let us prove Theorem 13.71 It is assumed that for any x G X, 
(R k ' x )k>i satisfies the LDP with scale k and rate function C x . We have p k,x = (Xi)#R k > x . 
Taking the continuous image X± : O — > X, by means of the contraction principle, see 
Theorem IA.2l at the Appendix, we obtain that for any x G X, (p k ' x )k>i satisfies the LDP 
with scale k and rate function 

y G X i-» inf '{C x (u); u G Q, : U\ = y} = c(x, y) G [0, oo]. 



• Proof of (1). The first assertion of Theorem 13.71 follows from the lower semicontinuity 
of c which was obtained at Lemma 15.61 Indeed, this shows that the assumptions of 
Proposition 13. II are fulfilled. The identity inf (MKd yn ) = inf flMKp is a direct consequence 
of Lemma 15.61 



• Proof of (2). The second assertion follows from inf (MKd yn ) 
of the minimal values which was obtained at item 



inf (IMKp . the convergence 



^1) together with the strict convexity 
(for the uniqueness) and the coerciveness (for the existence) of the relative entropy. The 
relation between P k and Ti k is (|13p . 

• Proof of (3). Let us first show that P >->■ (C, P) + i{p 0=Ai0 } is coercive on P(O). By (jMl) 
and the proof of Corollary 16. 4[ we see that its sublevel sets are relatively compact. Since 
C is lower semicontinuous, it is also lower semicontinuous. Therefore, it is coercive and 
so is P i — y (C, P) + 6{p 0=Mi p 1=Atl }. In particular, if inf (MK dyn ) < oo, the set of minimizers 
of ( MK dyn ) is a nonempty convex compact subset of PjJT). 



Let P be such a minimizer. It disintegrates as P(-) = f x2 P xy (•) Poi(dxdy) and with 
Lemma 15.61 we see that Pqi := 7? is a solution to (1MKI) . Moreover, J x2 cdn = ipft) = 



J n C dP = f X 2 ( Jq C dP xy j 7f(dxdy) and J n CdP xy > c(x,y) for 7?-a.e. (x,y). Hence, 

J n C dP xy = c(x, y) for 7?-a.e. (x, y). This means that for 7?-a.e. (x, y), P x y(Y x y) = 1 where 
Y xy := {uj G Q;uo = x,ui = y,C(u) = c(x,y)} is the set of all geodesic paths from x to 
y. Remark that T xy is a compact subset of Q which is nonempty as soon as c(x,y) < oo. 
In particular, it is a Borel measurable subset. Following the cases of equality, it is clear 
that if, conversely P G P(O) satisfies P x y(T x y^ = 1 for Poi-a.e. (x,y), then P minimizes 
Q i— > f n C dQ subject to Q i = P i- This completes the proof of the theorem. 



6. T-CONVERGENCE OF CONVEX FUNCTIONS ON A WEAKLY COMPACT SPACE 

A typical result about the T-convergence of a sequence of convex functions (fk)k>i i s: 
If the sequence of the convex conjugates (f£)k>i converges in some sense, then (fk)k>i 
T-converges. Known results of this type are usually stated in separable reflexive Banach 
spaces. For instance Corollary 3.13 of H. Attouch's monograph |Att84] is 

Theorem 6.1. Let X be a separable reflexive Banach space and (fk)k>i a sequence of 
closed convex functions from X into (— oo, +oo] satisfying the equi- coerciveness assump- 
tion: fk(x) > Qf(||rr||) for all x G X and k > 1 with lim r ^, +00 a(r)/r = +oo. Then, the 
following statements are equivalent 

(1) f = seqX^-r-limfc^oo f k 

(2) r = x;-r-\im n ^ 00 f* 

(3) VyeX*, r(y) =Um fc ^ 0O f* k (y) 

where X* is the dual space of X, seqX„, refers to the weak sequential convergence in X 
and X* to the strong convergence in X*. 
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Going beyond the reflexivity assumption is not so easy, as can be seen in Beer's mono- 
graph [Bee93j . 

In some applications in probability, the reflexive Banach space setting is not as natural 
as it is for the usual applications of variational convergence to PDEs. For instance when 
dealing with random measures on X, the narrow topology a(P(X), Cf,(X)) doesn't fit the 
above framework since Cb(X) endowed with the uniform topology may not be separable 
(unless X is compact) and is not reflexive. 

The next result is an analogue of Theorem 16 . 1 1 which agrees with applications for random 
probability measures. Since we didn't find it in the literature, we give its detailed proof. 

Let X and Y be two vector spaces in separating duality. The space X is furnished with 
the weak topology a(X, Y). 

We denote ic the indicator function of the subset C of X which is defined by Lc{x) — 
if x belongs to C and Lc(x) = +oo otherwise. Its convex conjugate is the support function 
of C : L* c (y) = swp xeC (x, y),y eY. 

Theorem 6.2. Let (gk)k>i be a sequence of functions on Y such that 

(a) for all k, gt is a real-valued convex function on Y, 

(b) (gk)k>i converges pointwise to g := lim^oo g^, 

(c) g is real-valued and 

(d) in restriction to any finite dimensional vector subspace Z ofY, (gk)k>i V -converges 
to g, i.e. T- lim fc ^ 00 ((7 fc + l z ) = g + l z , where l z is the indicator function of Z. 

Denote the convex conjugates on X : fk = gl an d f = 9* ■ 
If in addition, 

(e) there exists a a (X,Y) -compact set K C X such that domfk C K for all k > 1 
and dom / C K 

then, (fk)k>i ^-converges to f with respect to o~(X,Y). 

The proof of this theorem is postponed after the two preliminary Lemmas 16.51 and 16.61 

Remark 6.3. By ( |Mas93j . Proposition 5.12), under the assumption (a), assumption (d) 
is implied by: 

(d') in restriction to any finite dimensional vector subspace Z of Y, (gk)k>i is equi- 
bounded, i.e. for all y Q G Z, there exists 5 > such that 

svipswp{\g k (y)\;y eZ,\y- y Q \ < 5} < oo. 

fe>i 

A useful consequence of Theorem 16.21 is 

Corollary 6.4. Let (Y, \\ ■ ||) be a normed space and X its topological dual space. Let 
(9k)k>i be a sequence of functions on Y such that 

(a) for all k, g k is a real-valued convex function on Y, 

(b) (gk)k>i converges pointwise to g := lim^oo g^ and 

(d") there exists c > such that \gk(y)\ < c(l + \\y\\) for all y EY and k > 1. 
Then, (fk)k>i V -converges to f with respect to a(X, Y) where /& = g\ and f = g*. 
Moreover, there exists a a (X,Y)- compact set K C X such that dom/^ C K for all k > 1 
and dom / C K. 

Proof. Under (b), (d") implies (c). As (d") implies (d'), we have (d) by Remark 16.31 
Finally, (d") implies (e) with K = {x 6 X; \\x\\* < c} where ||a;||* = swp y \\ y \\<i(x, y) 
is the dual norm on X. Indeed, suppose that for all y G Y, g(y) < c + c\\y\\ and take 
x G X such that g*(x) < +oo. As for all y, (x,y) < g(y) + g*(x), we get \(x, < 
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(g*(x) + c)/\\y\\ + c. Letting \\y\\ tend to infinity gives ||a;||* < c which is the announced 
result. 

The conclusion follows from Theorem 16.21 □ 

Lemma 6.5. Let f : X — > (—00, +00] be a lower semicontinuous convex function such 
that dom/ is included in a compact set. Let V be a closed convex subset of X. 
Then, if V satisfies 

Vndom/^0 or 1/nddom/ = 0, (37) 

we have 

inf ' / (x) = - inf (f*(y) + L* v {-y)) G (-00, 00] (38) 

x&V y&Y 

and if V doesn't satisfy (3^, we have 



inf f(x) = - inf (f*(y) + L* w (-y)) = +00 (39) 

x£W y& 

for all closed convex set W such that W C int V. 

Proof. The proof is divided in two parts. We first consider the case where Vddomf 7^ 0, 
then the case where V D cl dom / = 0. 

• The case where V PI dom / 7^ 0. As V is a nonempty closed convex set, its indicator 
function l v is a closed convex function so that its biconjugate satisfies iy = by, i.e. 
Ly(x) = sup ygy {(a;, y) — iy(y)} for all x G X. Consequently, 

inf f (x) = inf sup {/(x) + (x,y) - t*y{y)}. 

One wishes to invert inf x€X and sup^gy by means of the following standard inf-sup the- 
orem (see [Eke74j for instance). We have inf x( zx sup ygy F(x, y) = sup^ gy inf^gx F(x, y) 
provided that mi x ^x sup^ g y F(x, y) 7^ ±00 and 

- domF is a product of convex sets, 

- x 1 — y F(x,y) is convex and lower semicontinuous for all y, 

- there exists y Q such that x >-> F(x, y Q ) is coercive and 

- y 1 — y F(x, y) is concave for all x. 

Our assumptions on / allow us to apply this result with F(x,y) = f(x) + (x,y) — t v (y). 
Note that 

inf f{x) > -00 (40) 

xdX 

since / doesn't take the value —00 and is assumed to be lower semicontinuous on a 
compact set. Therefore, if inf xg y fix) < +00, we have 

inf f{x) = sup inf {f(x) + (x,y) - i* v (y)} = - m£{f*(y) + i* v {-y)}. 

x£_v y^y y& 

• The case where V D cldom/ = 0. As cldom/ is assumed to be compact, by Hahn- 
Banach theorem cldom/ and V are strictly separated: there exists y Q G Y such that 
4(2/o) = sup x€V (x,y ) < mf cMomf (x,y ) < itf x€domf (x,y ). Hence, 

inf {(x,y o )-i v (y o )}>0 (41) 

x'Gdom / 
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and 



inf (f*{y) + L v(~y)) = sup inf {f(x) + (x,y) - L V {y)} 

yeY y& Y x ^ x 

= sup inf {f(x) + (x,y) - L V (y)} 

> inf/(x) + sup inf {(x, ay Q ) - t* v {ay )} 

x£X a>0 xGdomf 

= inf f(x) + supa inf {(x,y a ) - t* v (y )} 

i£A a>0 xGaom f 

= + OO 



where the last equality follows from (j40j) and (|4T|) . This proves that fl39|) holds with 
W = V. 

• Finally, if (J3Tj) isn't satisfied, taking such that C int V insures the strict 
separation of W and cl dom / as above. □ 

Lemma 6.6. Let the o~(X,Y)-closed convex neighbourhood V of the origin be defined by 

V = {x G X; ( yi , x) < 1, 1 < % < n} (42) 

with n > 1 and yx,...,y n 6 Y. Its support function Ly is [0, oo]-valued, coercive and its 
domain is the finite dimensional convex cone spanned by {yi, . . . ,y n }. More precisely, its 
level sets are {l v < b} = b cv{y 1 , . . . , y n } for each b > where cv{yi, . . . , y n } is the 
convex hull of {yi, . . . , y n }. 

Proof. The closed convex set V is the polar set of N — {y±, . . . , y n } '■ V = N°. Let x\ G V 
and x Q G E := fli<j< ra ker y^. Then, (yi,Xi + x a ) = (yi,Xi) < 1. Hence, x\ + x Q G V. 
Considering the factor space X/E, we now work within a finite dimensional vector space 
whose algebraic dual space is spanned by {y 1 , . . . , y n }. 

We still denote by X and Y these finite dimensional spaces. We are allowed to apply 
the finite dimension results which are proved in the book [RW98J by Rockafellar and 
Wets. In particular, one knows that if C is a closed convex set in Y, then the gauge 
function jc(y) '■= inf{A > 0;y G \C},y G Y is the support function of its polar set 
C° = {x G X; (x,y) < l,Vy G C}. This means that 7c = L* ca (see |RW98] . Example 
11.19). 

As V = {N°°)° and N°° is the closed convex hull of N, i.e. N°° = cv(iV) : the convex 
hull of N, we get V = cv{N)° and 

t-V = lcv(N)- 

In particular, for all real b, ty(y) < b <^ 7cv(A r )(y) < b y E b cv(N). It follows that the 
effective domain of i v is the convex cone spanned by y\, . . . , y n and i v is coercive. □ 

Proof of Theorem \6.2 . Let M{x ) denote the set of all the neighbourhoods of x Q G X. We 
want to prove that r- lim fc ^oo fk(%o) ■= swp UeMM lim*-^ inf xeU f k (x) = f(x ). Since / 
is lower semicontinuous, we have f(x ) = sup UeA f( Xo ^ inf xe u f{ x )i so that it is enough to 
show that for all U G Af(x ), there exists V G Af(x ) such that V C U and 

lim inf ' f k (x) = inf J{x). (43) 

K->oo xgV x£V 

The topology o~(X, Y) is such that Af(x ) admits the sets 

V = {x G X; I (y h x - x ) | < 1, i < n} 
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as a base where (yi, . . . , y n ), n > 1 describes the collection of all the finite families of 
vectors in Y. By Lemma [6.51 there exists such a V C U which satisfies 

inf fk{x) = — inf hk(y) for all k > 1 and inf f(x) = — inf h(y) 

where we denote h k (y) = g k (y) + t v (-y) and h{y) = g(y) + ty(-y), y G K 

Let Z denote the vector space spanned by (yi, . . . , y n ) and /if, h z the restrictions to Z 
of hk and h. For all y G Y, we have 

= -fro, V) + (44) 

and by Lemma ESI the effective domain of i v is Z. Therefore, to prove (J32D it remains to 
show that 

lim inf h z (y) = mih z (y). (45) 
k->oo y£r y£j 

By assumptions (b) and (d), (h z ) T-converges and pointwise converges to h z . Note that 
this T-convergence is a consequence of the lower semi continuity of the convex conjugate 
l v and Proposition 6.25 of |Mas93] . 

Because of assumptions (a) and (c), {h z ) is also a sequence of finite convex functions 
which converges pointwise to the finite function h z . By ( [Roc97j . Theorem 10.8), (/if) 
converges to h z uniformly on any compact subset of Z and h z is convex. 

We now consider three cases for x . 
The case where x Q G dom/. We already know that {h z ) T-converges to h z . To prove (jl51) . 
it remains to check that the sequence (h z ) is equicoercive (see |Mas93j . ??). 
For all y G Y, g(y) - (x ,y) > -f(x D ) and flBD imply h z (y) > -f(x ) + L* v _ Xo (-y). Since, 
—f( x o) > — oo and iy_ Xo is coercive (Lemma l6.6p . we obtain that h z is coercive. As (/if) 
converges to h z uniformly on any compact subset of Z, it follows that (/if) is equicoercive. 
This proves (1451) . 

The case where x D G cldom/. In this case, there exists x' D G dom/ such that V = 
x' + (V- x )/2 = {x G X; \(2y h x- x' Q )\ < l,i < k} G Af(x£) satisfies x G V C V C U. 
One deduces from the previous case, that (T4"5l) holds true with V' instead of V. 
The case where x a cldom/. As (/if) T-converges to h z , by ( [Bee93j . Proposition 1.3.5) 
we have lim sup^^ inf y& h z (y) < inf ye y h z {y). As x Q G" cldom/, for any small enough 
V G Af(x a ), mf y€Y h z {y) = -mf x€V f(x) = -oo. Therefore, lim^oo inf y& h z (y) = 
inf^ e y h(y) = — oo which is fHBI) . 

This completes the proof of Theorem 16.21 □ 

7. T-CONVERGENCE OF MINIMIZATION PROBLEMS UNDER CONSTRAINTS 

As the subsequent theorem demonstrates, the notion of T-convergence is well-designed 
for minimization problems. Let (fk)k>i be a T-converging sequence of (— oo, oo]- valued 
functions on a metric space X. Let us denote its limit 

T- lim f k = f. 

k— too 

Let 9 : X — > Y be a continuous function with values in another metric space Y. Assume 
that for each k > 1, fk is coercive and also that the sequence (fk)k>i is equi- coercive, i.e. 
for all a > 0, IJfc>i{/fe — a ) ^ s relatively compact in X. 

Theorem 7.1. Under the above assumptions, the sequence of functions (ipk)k>i on Y 
which is defined by 

ij; k {y) := mf{f k {x); x G X : 6{x) = y}, yEY,k>l 
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Y -converges to 

7p(y) :=mf{f{x);x E X : 9{x) = y}, y EY. 

In particular, for any y* E Y, there exists a sequence (yl) k >i in Y such that lim^oo yl = y* 
and Hindoo inf{/ fc (x); x E X : 9(x) = yl} = mf{f(x)] x G X : 9(x) = y*} G (— oo, oo]. 

Moreover, if y* satisfies inf{/(x);x G X : 9(x) = y*} < oo, then for each k > 1, the 
minimization problem 

fk(x) — > niin; x G X : 9(x) = yl 

admits at least a minimizer x k G X. Any sequence (xk)k>i of such minimizers admits at 
least one limit point and any such limit point is a solution to the minimization problem 

f(x) — > min; x E X : 9(x) = y*. 

The proof of this result which is based on Lemmas 17.21 and 17.31 below, is postponed 
after the proofs of these lemmas. 

Let Y be another metric space. We consider a T-convergent sequence (gk)k>i °f [0, oo]- 
valued functions onlxF with 

T- lim g k = g. 
Let us define for each k > 1 and y E Y, 

i>k(y) ■= w£g k (x,y), ip(y) := inf g(x, y). 

Assume that for each k > 1, gk is coercive and also that the sequence (g k )k>i is equi- 
coercive on X x Y. 

Lemma 7.2. Under the above assumptions on (gk)k>i, r-lmifc->oo ipk = ip in Y. 

Proof. Let us fix y* E Y and prove that r~ lim^oo ipk(y*) = tfiv*)- Since g k is assumed 
to be coercive, for every y EY, there exists x k>y E X such that V'fe(y) = 9k(%k, y ,y)- 

Lower bound. Let (y k )k>i be any converging sequence in Y such that lim k -+ooyk — y* ■ w e 
want to show that 

liminf i) k (y k ) > ^p{y*). 

Suppose that lim inf k ->oo 4>k{yk) < oo, since otherwise there is nothing to prove. We denote 
x l = Xk )Vk . Then, 

lim inf if> k (y k ) = liminf g k (x* h ,y k ) = lim g m {x* m ,y m ) = lim g n {x* n ,y n ) 

k—^oo k— s-oo m— >oo n—too 

where the index m at equality (a) means that we have extracted a subsequence such that 
liminffc^oo = lim^oo . At equality (b), once again a new subsequence is extracted in order 
that (x*) n >i converges to some limit point x* : 

lim x* n = x* . 

The existence of a limit point x* is insured by our assumptions that lim inf /^oo ipk{yk) < 
oo and [j k>1 {gk < a} is relatively compact for all a > 0. Now, by filling the holes 
in an approriate way one can construct a sequence (x k )k>i which admits (x n ) n >i as a 
subsequence and such that lim fc _ >00 £ fc — x*. It follows that 

liminf ip k (y k ) = lim g n (x* n ,y n ) > liminf g k (x k , y k ) > g(x*,y*) > ip(y*) 

k^oo n— s-oo k— s-oo 

which is the desired result. At the marked inequality, we have used our assumption that 
T- linifc^oo g k = f. 
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Recovery sequence. Under our assumptions, the T-limit g is coercive on X x Y, see 
[Mas93[ Thm 7.8]. It follows that g(-,y*) is also coercive and that there exists x £ 
argming(-, y*). Let (x k , Vk)k>i be a recovery sequence of (gk)k>i at (x, y*). This means that 
lim fc ^ 00 (x fc ,?/ fc ) = (x,y*) and h.mialk^ g k (x k ,y k ) < g{x,y*) = ip{y*). We see eventually 
that 

liminf ip k (y k ) < liminf g k (x k , Vk) < i>{y*), 

k—^oo k—>oo 

which is the desired estimate. □ 
Let us fix y* £ Y. By Lemma I7T21 there exists a sequence (y k ) k >i such that 

lim yl = y\ lim xp k (y* k ) = tp(y*). (46) 

k— >oo k— >oo 

Let us define 

<^fcO) := yl), ip(x) := #(x, y*), x £ X 
for all > 1. Since ^ is coercive, ip k is also coercive. In particular, if if)(y*) = inf x f < oo, 
its minimum value ipk(y k ) = infx V^fc is finite and therefore attained at some x k £ X. 



Lemma 7.3. In addition to the assumptions of Lemma \ 7. 2\ suppose that infx <p < oo. 
For each k, let x k be a minimizer of <p k . Then the sequence (x k ) k >i admits limit points in 
X and any limit point is a minimizer of (p. 

Proof. We have already noticed that for each k, ip k is coercive so that it admits one or 
several minimizers. Since lim^oo infx f k = inf x V 9 < °°> we see that sup fc infx<^fc < °°- 
It follows from the assumed relative compactness of {J k>1 {g k < a } for all a > 0, that 
Ufc>i argmin (p k is also relatively compact. Therefore any sequence {x k )k>i of minimizers 
Xk £ argmin ip k admits at least one limit point. 
As f k (x k ) = i/j k (y k ), we see with (146]) that 

lim ip k {x k ) = inf (p. 

k— >oo 

On the other hand, let x be any limit point of (x k ) k >\. There exists a subsequence (indexed 
by m with an abuse of notation) such that limm^oo x m = x. Because of the assumed T- 
limit: r~ lim^^oo g k = g, we obtain 

(f(x) := g(x,y*) < liminf g m (x m , y* m ) := liminf (pm{x m ) = lim ip k (x k ) =inf<£. 

m— >oo m— ¥oo k^oo 

It follows that x is a minimizer of (p. □ 
Proof of Theorem 7.1\ Consider the functions 

g k (x, y) := f k (x) + L {y=e{x)h (x, y) £ X x Y, 

for each k > 1 and 

g(x, y) ■= f(x) + Hv=e{x)}, (x, y) £ X x Y. 
Because of Lemmas 17.21 17.31 and (j4"6"j) , to complete the proof it is enough to show that 

T-\img k = g (47) 

fe— >oo 

together with the coerciveness assumptions of these lemmas. 

Let us begin with the coerciveness. Since for each k > 1, f k is coercive and 9 is 
continuous, we see that for any large enough a, {g k < a} = {(x, y) £ X x Y; x £ {f k < 
a},y = 0(x)} is compact, i.e. for each k > 1, g k is coercive. As (f k )k>i is assumed to be 
equi-coercive, its T-limit / is coercive and it follows by the same argument that g is also 
coercive. We also see that [j k>1 {gk < a} = {(x, y) £ X x Y; x £ [j k>1 {fk < a}, y = d(x)} 
is relatively compact, i.e. (g k ) k >i is equi-coercive. 
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Let us prove that (pETj) holds true. Let (x,y) £ X x Y be fixed. We have to prove that: 

(i) For any sequence (x k ,y k ) k >i such that lim fe _ >00 (a; fc , y k ) = (x,y), 
]hnM k .+ 00 f k (x k ) + L {yk=e{xk)] > f(x) + L {y=e{x) y 

(ii) There exists a sequence (x k ,y k ) k >i such that lim fc _ s . 00 (x/ c , y k ) = (x,y), and 
liminf fc _ >00 / fc (xA : ) + i{y k =e{x k )} < f(x) + i{ y =e( x )}- 

Suppose first that y ^ 9{x). Then (ii) is obvious and due to the continuity of 9, for any 
sequence (x k ,y k ) k >i such that lim k -+ 00 (x k ,y k ) = (x,y) we have that for all large enough 
k, 6(x k ) y k . This proves (i). 

Now, suppose that y = 9(x). Then (i) follows from liminf^oo f k (x k ) + i{ Vk =e{x k )} > 
liminffc^oo f k {x k ) > f(x) = f(x) + L{ y =e( x )}, whenever lim^oo x k = x. To prove (ii), 
take a recovering sequence (x k ) k >i for (f k ) k >i at x, i.e. liminf/ c _ i>00 f k (x k ) < f(x) and put 
y k = 9(x k ), for each k > 1. By the continuity of 9, linn^oo y k = y, so that linifc^ 00 (xA ; , y k ) = 
(x,y). We also have liminf^oo f k (x k ) + L {y - k=e ^ k)} = liminf^oo f k (x k ) < f(x) = f(x) + 
L{ y =e(x)}, which proves (ii) and completes the proof of the theorem. □ 



Large deviation principle. We refer to the monograph by Dembo and Zeitouni [DZ98J 
for a clear exposition of the subject. Let X be a polish space furnished with its Borel 
a-field. One says that the sequence (7„) n >i of probability measures on X satisfies the 
large deviation principle (LDP for short) with scale n and rate function /, if for each 
Borel measurable subset A of X we have 



where int A and cl A are respectively the topological interior and closure of A in X and 
the rate function I : X — Y [0, oo] is lower semicontinuous. The inequalities (i) and (ii) are 
called respectively the LD lower bound and LD upper bound, where LD is an abbreviation 
for large deviation. The LDP is the exact statement of what was meant in previous section 
when writing 



for "all" A C X. 

It is sometimes too much demanding to have the upper bound limsup n ^ 00 ^log7„ (C) < 
— mi xeC I(x) for all closed sets C. One says that we have the weak LD upper bound if 



for every compact subset K of X. In case (7«)n>i satisfies the LD lower bound (for all 
open subsets) and the weak LD upper bound (for all compact subsets), one says that 
(ln)n>i satisfies the weak LDP. 

An important instance of large deviation principle is given by the Sanov theorem. 
Consider a probability measure R £ P(X ) on the polish space X and furnish P(X) with 
the narrow topology a(P(X), Cb(X)) and the corresponding Borel a-field. Let Zi, Z 2 , . . . 
be a sequence of independent X-valued random variables with common law R, i.e. F(Zi £ 
B) = R{B) for any Borel measurable subset B C X and any % > 1. In other words 



Appendix A. Large deviations 




(i) 1 l (") 

inf I(x) < liminf — log7„(A) < limsup — log7 n (A) < — inf I(x) (48) 




limsup — log7 n (i^) < — inf I(x) 

rwoo Tl x£K 



(Z 1 ,..., 



Z n ) # P = R® n for all n > 1. 
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Theorem A.l (Sanov's theorem). Under the above assumptions, the empirical measure 

1 n 

£ B :=-£** GP (*) 

satisfies the LDP^\ in P(X) with scale n. Its rate function is H(-\R) : P(X) — > [0, oo], the 
relative entropy with respect to the reference probability measure R. 

Here, the LDP stands in X = P(X) and for each n, 7 n = {L n )#P G P(P(X)). For a proof 
of this result, see [DZ981 Thm 6.2.10]. 

Next theorem states that the continuous image of a LDP is still a LDP with the same 
scale. 

Theorem A. 2 (Contraction principle). Let ( r y n ) n >i satisfy the LDP in X with scale n 
and rate function I. Suppose in addition that I is not only lower semicontinuous, but that 
it is coercive. For any continuous function f : X —*Y from X to another polish space Y 
furnished with its Borel a -field, 

(/#7n)n>l 

satisfies the LDP in Y with scale n and the rate function 

J(y) = inf{I(x);x : f(x) = y}, yeY. 
Moreover, J is also coercive. 

For a proof, see [DZ981 Thm 4.2.1]. 

Let us look at an example of application of the contraction principle which is in the 
mood of this article. Consider an independent sequence of identically distributed random 
paths, i.e. (Z\, . . . , Z n )#F = R® n where the reference probability measure R belongs to 
P(fi). The empirical measure L n is a P(f2)-valued random variable. Now let / be the 
marginal projection 

f(P) = (X ,X 1 ) # P = (p ,Pi) e P(X) x P(X), P e P(fi). 

It is a continuous function. This is clear when Q = C([0, 1], X) and it remains true when 
Q = D([0, 1],X) (£ = 0,1 being the initial and final times, X Q and X\ turns out to be 
Skorokhod-continuous). Using the notation of the previous section, we see that 

f(L n ) = (L n ,L n 1 ). 

By Sanov's theorem, the sequence of empirical measures L n satisfies the LDP in P(O) with 
scale n and rate function H(-\R). Applying the contraction principle with / as above, we 
see that (Lq , L") n >i satisfies the LDP in P(X) x P(X) with scale n and rate function 

J{Ho,Hi) = M{H{P\R);Pe P(Q) : P = fx , P 1 = ^} G [0,oo], /i ,/ii G P{X), 

compare (jSJ). 

Theorem A. 3 (Laplace- Varadhan principle). Suppose that (7„)„>i satisfy the LDP in X 
with a coercive rate function I : X — » [0, oo], and let f be a continuous function on X. 
Assume further that 

lim liminf — log / e nf ^ lst>M i ln(dx) = — oo. 



This is an abuse of definition. The correct statement should be: the sequence ((L™)#P) n >i satisfies 
the LDP. 
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Then, 



lim — lc>£ 

n->oo n 



z n f^ ln (dx) = su V {f(x)-I(x)}. 



X 



xex 



For a proof, see [DZ981 Thm 4.3.1]. 
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